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ABSTRACT 


An empirical orthogonal function (EOF) representation of relative vorticity is used 
to forecast recurvature (change in storm heading from west to east of 000° N) of western 
North Pacific tropical cyclones. The time-dependent coefficients of the first and secon¢ 
EOF eigenvectors vary in a systematic manner as the tropical cyclone recurves arou. d 
the subtropical ridge and tend to cluster about the same values at recurvature tine. In 
contrast, the coefficients for straight-moving storms tend to cluster in a diffrent region 
in EOF space. Exploiting this Euclidean distance approach, additional EOF coefficients 
are identified that best represent the vorticity fields of recurving and straight-moving 
storms. Classification of an individual case is then into the closest time-to-recurvature 
in 12-h intervals or straight-moving storm category as measured in multidimensional 
EOF space. Although rather subjective, the Cuclidean method demonstrates skill rela- 
tive to climatological forecasts. A more objective discriminant anal; sis technique is also 
tested. A final version that involves the first six EOF coefficients of the 250 mb vorticity 
field is useful (72°%o correct) in identifying recurvers or straight-movers during the 72-h 
forecast period. Skill in classifving situations within 12-h time-to-recurvature groups is 
low, but might be improved using other analysis techniques or in combination with other 
predictors. 
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I. INTRODUCTION 


Tropical cyclones have formidable destructive power and annually exact tremendous 
losses in lives and property. The western North Pacific Ocean is the most active tropical 
cyclone basin in the world. An average of 31 tropical cyclones have occurred annually 
during the 25-year period ending in 1984 (ATCR 1984). The damage from these storms 
can be minimized only through preparedness and avoidance. Precautionary measures 
can require considerable time, Therefore, accurate storm forecasts are critically impor- 
tant to both the military and civilian communities. 


A. BACKGROUND 

Tropical cyclones can be classified into three broad categories based on their track. 
If a storm moves west or northwest throughout its life, it is classified as a straight-mover 
(TY Agnes in Fig. 1). A storm that turns from a westward or northwestward path 
through North to a northeastward track is defined as a recurver (ST Vanessa in Fig. 1). 
Storms that do not fit either the straight-mover or the recurver categorics are classified 
as odd-movers (ST Bill in Fig. 1). Odd-mover tracks are typically erratic and may dis- 
play loops or a stairstep-type track. The largest forecast errors occur when recurving 
storms had been forecast to move straight toward the west or northwest, or when 
straight-movers had been forecast to recurve to the north or northeast. Incorrect re- 
curvature forecasts result in 72-h track forecust errors of over 1850 km (1000 n mi) al- 
most every year (Sandgathe 1987), Situations associated with recurvature, due either to 
eyclone-midlatitude trough interaction or to cyclone-subtropical ridge interaction, are 
listed among the Joint Typhoon Warning Center’s (JTWC’s) most difficult forecast 
problems (Sandgathe 1987), Since nearly half of all western North Pacific tropical 
cyclones eventually recurve, these recurvature forecast questions are frequently faced by 
operational forecasters. 

None of the present objective furezast aids in operational use are specifically de- 
signed to identifV recurvature situations. Leftwich (1979) and Lage (1982) used re- 
gression analysis techniques to predict recurvature, which they defined as a net 
displacement north of 315° during the forecast period. Leftwich (1979) included posi- 


en mal acd. Seen eteee acd BAIT FAN axel Sam enene! VP Pateton aus ti wee Mes wee eh 
UO, MIGUGI, TACMSNY AG ryvV-kVU TO ZCOPOrcilial NCEE preuiervO7s OG 2107 Cease tae 


probability of recurvature in Atantic tropical cyclones. Geopotential heights were re- 
presented by gridpoint values on a relocatable storm-centered grid. Leftwich concluded 
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Fig. 1. Examples of 1984 tropical cyclones classified by track type. Straight- 
mover, TY Agnes (dashed line); recurver, ST Vanessa (dotted line); and odd-mover, 
ST Bill (solid line). 


that the inclusion of synoptic predictors improved the model forecast skill, but none of 
his statistical models out-performed climatological forecasts. Lage (1982) used an em- 
Pirical orthogonal function (EOF) representation of 500 mb geopotential height fields 
plus persistence-related variables to predict western North Pacific tropical cyclone re- 
curvature or non-recurvature at 36-, 54- and 72-h forecast intervals. The combination 
of persistence plus EOF predictors consistently out-performed the persistence alone or 
the EOF predictors only methods. Each of these three techniques was superior to 
climatology and chance at all forecast times. 

The purpose of this study is to test the feasibility of using an EOF representation 
of the synoptic vorticity fields at 700, 4u0 and 250 mb to identify recurvature situations 
in western North Pacific tropical cyclones. Because horizontal pressure gradients are 
generally weak and geostrophic relationships deteriorate in the tropics, geopotential 
heights provide a poor estimate of the steering flow. Since vorticity combincs the 
stecring ellects of both zonai and meridional winds, it shouid provide a more accuraic 
measure of steering with fewer predictors than would be required if the two components 
of the wind were uscd separately as predictors. An EOF representation of a synoptic 











field such as vorticity offers several important advantages over gridpoint values, Because 
EOF predictors represent spatial patterns in environmental fields, they contain more 
synoptic information and are less affected by observational errors. Because relatively 
few EOF predictors are required to represent large amounts of variance in synoptic 
patterns, considerable savings in computer storage and forecast model run times can be 
realized using this method. 

EOF predictors have been used successfully in statistical-synoptic models to forecast 
tropical cyclone motion (Shaffer and Elsberry 1982; Peak et al. 1986; Schott et al. 1987; 
and Elsberry et al. 1988). Shaffer (1982) demonstrated the usefulness of EOF represen- 
tation of 500 mb geopotential heights as synoptic forcing predictors in statistical- 
synoptic track prediction schemes. In a similar study, Wilson (1984) used EOF 
representation of 700, 400 and 250 mb wind component fields to forecast tropical 
csclone motion. Schott (1985) stratified forecast situations by the cyclone direction of 
mouon to develop a statistical adjustment scheme involving EOF predictors that reduced 
the s\stematic errors in a dynamical track prediction model. Meanor (1987) used 
Schott’s stratification scheme and LOF predictors of vertical wind shear to develop a 
sinular model to adjust for systematic errors in a dynamical track prediction model. 
Weniger (1987) adopted Meanor’s EOF predictors of vertical wind shear to develop a 
successful tropical cyclone intensity forecast model. Gunzelman (1990) used the EOF 
approach as a filter to represent the “signal” in the vorticity field, and suggested that 
several diflerent forecast situations could be interpreted as an advection of these filtered 


vorticity fields. 


B. OBJECTIVE 

The objective of this studv is to demonstrate the ability of an EOF representation 
of the synoptic vorticity field to identify potential recurvature situations. The hypothesis 
is that the adjacent synoptic features cause the turning motion that leads to tropical 
csclone recurvature. Consequenuy, the sets of LOF coefficients for the vorticity fields 
associated with recurvature should be different from those associated with straight-track 
situations. The question ts, how far in advance of recurvature are the recurvature EOF 
coefficients distinguishable from the straight-track EOF coefficients? Classification 
goals are two-fold: first, to identify the overall track type as a recurver versus a 
straight-mover, and second, to idenuly the ume to recurvature with the dest possibic 
time resolution. Recurvature is defined here as the time when the storm heading changes 
fiom west of OUUP North to cast of 000° North. A track segment will be classified as a 











straight-mover if the storm does not recurve during the next 72 h, which corresponds to 
the official JTWC forecast period. The time to recurvature will be specified in 12-h in- 
crements. In summary, the first goal of the study is to determine whether the present 
vorticity field is representative of a recurvature situation within 72 h versus that of a 
straight-mover; if so, the second goal is to specify the most likely time to recurvature. 

Two methods are used to develop the classification model. In the Euclidean distance 
approach, classifications are into the group that has the closest mean EOF predictor 
values as measured in multidimensional space. This simple approach provides physical 
insight into the classification problem. The difficulty is in determining which predictors 
best separate the groups. Therefore, a discriminant analysis package also is used to 
more objectively demonstrate the predictive capabilities of an EOF representation of the 
vorticity field. 














Il. DATA AND METHODS 


A. DATA DESCRIPTION 

The cases in this study are 12-hourly data for western North Pacific tropical 
cyclones during 1979-1984. These cases are a combination of the cases analyzed by 
Wilson (1984), Peak et al. (1986) and Gunzelman (1990). Wilson and Peak et al. ex- 
tracted the Global Band Analyses (GBA) wind fields for each case. Gunzelman com- 
puted the relative vorticity from these wind fields and performed the EOF analyses of the 
vorticity fields. The following restrictions were applied to the selection of cases: 


® a tropical cyclone attaining at least tropical storm strength (maximum sustained 
winds of 18 m.s (35 kts) or greater); 


@ a best track position west of the dateline, cast of 100° E and south of 34.6° N; and 
¢ the meridional and zonal wind components of the GBA are available at 700, 40) 
and 25y mb. 
A total of 1573 cases met these requirements and were analyzed. 
1. Field description 
The GBA wind fields are operationally generated every 12 h by the United 
States Navy Fleet Numerical Oceanography Center (FNOC). The GBA provide global 
longitudinal coverage between 40.956° § and 59.745° N. The analyses are produced on 
a Mercator grid with spacing of 2.5° latitude at 22.5° N and S. Although zonal and 
meridional wind fields also are available at the surface and 200 mb, only the 700, 400 
and 24y mb levels are used. Analyses are based on surface observations, ship reports, 
rawinsondes, pibals, aircraft reports and satellite-derived cloud motion vectors. When 
a tropical cyclone is present, eight bogus winds are inserted at the surface 80 km (43 n 
mi) from the center of the cyclone, and are coupled vertically via the thermal wind 
equation using temperature analyses at the intermediate levels. A detailed description 
of the GBA is contained in the U.S. Naval Weather Service (1975). 
Wilson (1984) and Peak et al. (1986) used a bi-linear interpolation scheme to 
interpolate the zonal («) and meridional (vr) GBA wind components onto a storm. 
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gridpoinis north-to-south and 31 gridpoints east-to-west. The center of the cyclone, 

















based on the JTWC warning position, is always located at gridpoint (16,9). Gunzelman 
(1990) computed relative vorticity 


Ov Au 
Gx dy’ eq) 


using centered finite differences at the internal gridpoints, and one-sided differences at 
the grid boundaries. Gunzelman noted that the mean vorticity fields are nearly vertical 
near the tropical cyclone, and the largest positive vorticity values around the cyclone are 
at 700 mb and decrease with height. The 700 mb vorticity field also has the largest dif- 
ference between the positive values associated with the cyclone and the negative values 
associated with the subtropical ridge, and the gradient decreases with height. However, 
the magnitude of the vorticity associated with the subtropical ridge increases with height 
and is greatest at 250 mb. 
2. Empirical orthogonal function analysis 

The EOF method used by Gunzelman (1990) paralleled the procedures used by 
Wilson (1984) and Meanor (1987), except that it was applied to relative vorticity rather 
than wind components or the vertical wind shear. The EOF analysis was on the 527 
point storm-centered grid (Section I1.A.1) and was based on the same 6$2 dependent 
cases during 1979-1983 that Wilson used. 

In this method, orthogonal eigenvectors and their associated cigenvalues (coef- 
ficients) are calculated from the dependent set vorticity ficlds at each pressure level. 
First, X (527) eigenvectors are calculated from a normalized X¥ (527 x 682) matrix of 
A (427) gridpoint values for the }° (682) cases. The original synoptic gridpoint values 
can be recovered by the linear summation of the products of the eigenvectors and their 
associated coefficients. The first eigenvector (spatial pattern) contains the largest vari- 
ance. The second eigenvector contains the largest amount of the variance not explained 
by the first. and so on. Once the eigenvectors are determined from the dependent data 
set, the time dependence in the synoptic pattern for each case is contained in the EOF 
coefficients. 

One of the advantages of the EOF representation is that a relatively small 
number of EOF eigenvectors can be used to represent a synoptic pattern. To determine 
the nunimum number of eigenvectors that are needed to represent the signal in the 
vorticity field, Gunzelman (1990) applied the Preisendorfer and Barnett (1977) Monte 
Carlo technique to distinguish between eigenvectors with signal vice those with noise. 
In this method, cigenvalues for the physical data are compared to cigenvalues tor 














randomly generated data. If the physical eigenvalue deviates significantly from the 
eigenvalue computed from a random vorticity field, there is reasonable assurance that 
the associated eigenvector is describing signal rather than noise. Based on Gunzelman’s 
(1990) results, the first 45 vorticity modes are retained as potential descriptors of the 
synoptic fields associated with recurvature in this study. The first 45 modes explain be- 
tween 72.8 and 77.5% of the vorticities at the three pressure levels (Table 1). 


Table 1. PERCENTAGE OF EXPLAINED VARIANCE WITH 1 TO 45 
MODES: Cumulative percentage of variance (95% confidence) with 1 to 
45 EOF modes retained for the relative vorticity fields at three pressure 
levels (after Gunzelman 1990), 





Each eigenvector consists of 527 values that represent a spatial pattern on the 
31 x 17 analysis grid. The magnitude of the associated time-dependent EOF coefficient 
indicates the relative importance of that pattern in each specific case. A negative EOF 
coefficient indicates that the identical spatial pattern applies, except that the maxima 
and minima are reversed. 

The first eigenvector for 700 mb vorticity (Fig. 2) can be interpreted as a tropical 
cyclone in the subtropical ridge if this pattern is multiplied by a negative coefficient. 
For example, the 700 mb Mode 1! coefficient for ST Vanessa at recurvature time is -4.57. 
Therefore, the opposite pattern with a positive vorticity value at the storm center (dot) 
applies, and represents a recurving tropical cyclone at the axis of the subtropical ridge. 
Mode 1 eigenvectors for 400 and 250 mb relative vorticity (not shown) represent 











large-scale patterns similar to the Mode 1 pattern for 700 mb in Fig, 2. As the spatial 
patterns become increasingly more complex for higher mode eigenvectors, the patterns 
become increasingly more dissimilar among the three pressure levels (see Gunzelman 
1990 for further discussion). 





Fig. 2. Mode 1 eigenvector at 700 mb. Positive (negative) values are solid 
(dashed). North latitude is along the y-axis and east longitude is along the x-axis. 
The black dot indicates the storm center position (after Gunzelman 1990). 


Reconstructed 700 mb vorticity fields for ST Vanessa at recurvature time using 
only the first 45 EOF modes and all $27 modes are compared in Fig. 3. The basic pat- 
tern of a tropical cyclone at the axis of the subtropical ridge with a strong vorticity 
gradient to the east, and cyclonic vorticity associated with the midlatitude trough to the 
north, is represented equally well with 45 EOF modes as with all 527 modes. The addi- 
tion of the highcr EOF modes adds smaller scale features, which are assumed to repre- 
sent noise in the vorticity field. 


B. SELECTION OF CASES 

A recurvature forecast model learning sct is selected from the 1573 cases in the 
1979-1984 data set. As a first step in the scicction process, the data are categorized by 
track type and time to recurvature. Initial identification of the cases as recurvers, 
straight-emovers and odd-movers is based on the tropical cyclone track categories 
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Fig. 3. Reconstructed 700 mb vorticity for ST Vanessa at recurvature, Relative 
vorticity contours (10-Ss~') reconstructed from the first 45 EOF modes (top) and 
from all 527 modes (bottom), Positive (negative) values are solid (dashed). North 


latitude is along the y-axis and east longitude is along the x-axis. The black dot in- 
dicates the storm center position. 











assigned by Miller et al. (1988). For each recurving tropical cyclone, the storm heading 
between successive 6-h JTWC best track positions is computed and the recurvature time 
is identified as the 00 or 12 UTC nearest the 6-h interval in which the storm heading 
changed from west of 000° North to east of 000° North. This synoptic map time, for 
which a GBA is available to calculate the vorticity EOF coefficients, will be referred to 
as R-00h where R indicates recurvature and -00h indicates the number of hours (0) prior 
to recurvature time. Recurver cases within 96 h of recurvature are then categorized 
based on the time to recurvature into the R-96h through R-00h classification groups. 
Cases more than 96 h prior to recurvature are identified as pre-recurvers (PR). Cases 
after recurvature are excluded from the forecast model learning set. The straight-mover 
cases are identified as non-recurvers (NR) if a minimum of 72 h remains in the track to 
establish that recurvature does not follow in that time. This requirement excludes from 
the learning set all straight-mover cases that cannot be verified as non-recurvature situ- 
ations throughout a 72-h forecast period. Odd-mover cases (382 cases from 33 tropical 
cyclones) are not included in the model learning set, but will be used to test the ability 
of the final EOF recurvature forecast model to classify these cases into the straight- 
mover or recurver group that most closely describes the storm motion. 

After screening, a total of 782 cases from 97 storms are retained in the model 
learning set (Tabie 2). Although the learning set cases in the Euclidean distance ap- 
proach and the discriminant analysis approach differ, the entire learning set will be used 
to compare the overall prediction skill of the approaches. 


C. CRITERIA FOR EVALUATING MODEL PERFORMANCE 

Evaluation criteria are chosen to test the forecast model’s ability to meet the two 
classification goals: identification of track type and identification of the time to recur- 
vature. Since no objective guidance is available (or official forecast is issued) as to 
whether a storm will be a recurver or a straight-emover, the only absolute measure of 
usefulness is a comparison with a climatological forecast of recurvature. 

1. Percent correct 

The percent of cases correctly forecast as recurver (%R) or straight (%S) and 

the total correctly forecast in both track type categories (YT) tests the model's ability 
to identify the overall track type. The percent correct is calculated for recurver and 
Straight-track types defined by the tines in Table 2, That is, a classification into any 
of the R-72h through R-00h groups is crusidered to be a correct forecast of a recurver. 
Similarly, classification into the NR, PR and R-96h through R-84h groups represent 
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Yable 2. RECURVATURE MODEL LEARNING SET CASES BY FORECAST 
CATEGORY: Number of 1979-1984 tropical cyclones that are categor- 
ized as recurver or straight track types. The recurver learning set is defined 
as those times within 72 h of recurvature time (R-OOh), The straight 
learning set includes all times preceding 72 h of recurvature time, plus se- 
lected times from the straight-track storms, The number of cases retained 
in the model learning set is listed for each track-type category and for each 
12-h forecast category. 


R OF 2-H FORECAST NUIBER 
CATEGORIES OF CASES 
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correct forecasts of a straight-track situation, because the tropical cyclone did not re- 
curve during the 72-h forecast period. The simple percent correct measure is also used 
in evaluating the time-to-recurvature prediction performance of the model. In that case, 
only a classification into the appropriate time-to-recurvature group will be credited as a 
correct forecast. 
2. Classification matrix scores 

Classification matrix scores assign penalty points to misclassifications as a linear 
function of the number of 12-h categories between the prediction and the verification 
groups. That is, one additional penalty point is assigned for each 12-h group between 
the model forecast and the verification. Since a misclassification of a recurvature case 
into the PRNR forecast group represents a larger error, two additional penalty points 
are assigned in the PRNR category relative to the R-96h category. Because this is a 
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penalty score, higher skill is represented by numbers close to zero, A penalty score of 
1.0 would indicate that the average misclassification is off by one category. 

Three classification matrix scores are defined based on the matrix of penalty 
points in Table 3: D-score (dependent); I-score (independent); and R-score (recurver). 
Given a classification matrix that contains the number of cases that are forecast in each 
classification group (columns) and verify in each verification category (rows), the penalty 
points in Table 3 are assessed by multiplying the number of cases by the penalty points 
for that error. No penalty points are given to the correct classifications along the 
diagonal, 


Table 3. MATRIX OF PENALTY POINTS FOR CLASSIFICATION MATRIX 
SCORES: Penalty points are assessed for erroneous forecasts of time- 
to-recurvature in 12-h increments or as PRNR. These penalty points are 
summed over three subsets to calculate the classification matrix D-, I- and 
R-scores. The matrix columns (forecast model classification groups) and 
rows (case verification categories) are the same as those in the model 
classification matrix. 
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The three classification matrix scores are obtained by multiplying the classifica- 
tion matrix of model results by the penalty point matrix and calculating three sums of 
the products, These sums are then normalized by the number of cases in the sample so 
that the scores can be compared for different sample sizes, The three classification ma- 
trix scores examine various aspects of the forecast model skill by scoring only cases that 
belong to certain verification categories, The classification matrix I-score includes cases 
in all of the verification categories (R-00h through R-96h plus PR and NR) that are in 
the independent sample. The D-score is dcsigned to compare results from dependent 
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and independent «3mples, which contain difterent sets of cases, Since the PR cases are 
not always incluacd in the dependent set to define a PRNR classification group, the PR 
case forecasts are excluded in the classification matrix D-score. Consequently, the D- 
score and I-score will have similar magnitudes, with an offset that is proportional to the 
performance of the forecast model on the PR cases. The D- and I-score will provide an 
exact comparison of forecast skill only if the ratio of the combined number of PR and 
NR cases to the combined number of R-00h through R-96h cases is the same for both 
data sets (e.g., in the learning set), Since the PR and NR cases are assigned more pen- 
alty points for misclassifications than the R-00h through R-96 cases, the relative number 
of cases in each group will affect the matrix scores that score PR and NR forecasts. The 
R-score is an indication of the model's ability to correctly identify the time to recurva- 
ture in recurver cases, That is, the penalty scores in Table 3 are only summed over the 
R-00h through R-72h verification categories. 
3. Climatological forecasts and scores 

A climatological forecast is obtained by counting the number (N in Table 4) of 
JTWC best track 00 and 12 UTC positions for 1979-1984 cyclones of tropical storm 
strength or greater in each classification group (R-00h through R-96h plus PRNR). 
Thus, the cliv-atology data sct contains all the learning set cases, plus additional cases 
that were excluded from the learning set because either the best track position did not 
mevt the requirements in Section 1J.A or the GBA wind fields were not available at all 
three pressure levels. The percentage of recurving (41.7), straight-moving (36.4) and 
odd-moving storms (21.9) for these six years is representative of the percentages (42.5, 
36.4 and 21.1, respectively) for the 28-year period 1945 to 1987 (Miller et al. 1988). 

To obtain the climatological forecast classification matrix (Table 4), a fraction 
of the learning set cases in each 12-h verification category are forecast into each of the 
ten classification groups based on the percent of climatological cases in each of the ten 
groups (percent in Table 4). By ignoring the straightemover cases with less than 72 h 
remaining and the odd-mover cases, these climatological forecasts can be compared to 
the model forecasts predicated on similarly screened data. The skill scores for the 
climatological forecasts of the learning set cases are given in Table 5. Any forecast 
model should have higher percent correct and lower D-score, I-score and R-score to be 
considered as useful to the forecaster. 





Table 4. 
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Table 5. 


CLIMATOLOGICAL FORECASTS FOR THE LEARNING SET: The 
learning set cases belonging to each verification category are classified into 
the ten classification groups with the relative frequency (column labeled 
percent) that the cases in the 1979-1984 climatology data set belong to 
each of the ten classification groups. Since the number of classifications 
is rounded to the nearest whole integer, the total is 780 vice 782. 
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FORECAST SKILL FOR CLIMATOLOGICAL FORECASTS: Percent 

of recurver (%R), straight (%S) and total (%T) cases correctly classified 

according to track type. D-, [- and R-score are classification matrix scores 

that indicate skill in correctly classifying cases with 12-h accuracy. Scores 

are computed from the actual number of learning set cases that 

ee occur in each group vice the integer values presented in 
able 4. 
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Hil. EUCLIDEAN DISTANCE METHOD 


A. BACKGROUND 

The Euclidean distance approach in this section examines both the physical changes 
in the vorticity patterns that precede tropical cyclone recurvature and the ability to dis- 
tinguish among these patterns using an EOF representation. Since the time-dependent 
EOF coefficients represent the synoptic fields that exist at each time, the coefficients 
should vary in a systematic manner as the tropical cyclone moves around the subtropical 
ridge during recurvature. Simple two-dimensional plots of the first and second EOF 
coefficients on the x- and y- axes in Fig, 4 indicate that these coefficients for the 1984 
recurvers have similar traces. The Mode 1 coefficients are initially positive, which indi- 
cates a large-scale positive vorticity pattern centered along the latitude of the storm 
center in the first eigenvector (Fig. 2) and represents the synoptic pattern while these 
storms are still located in the monsoon trough. As these storms move northward out 
of the monsoon trough and recurve, the magnitude of the Mode 1 coefficients decreases 
and then becomes negative to represent the negative vorticity associated with the sub- 
tropical ridge. At the time of recurvature, the first and second EOF coefficients for the 
1984 recurvers tend to cluster in the same region on the two-dimensional plot. In con- 
trast, the 1984 straight-moving cyclones have EOF coefficients that cluster in a separate 
region, and the odd-moving cyclones have coefficients that exhibit characteristics of both 
the recurvers and straight-movers (Fig. 5). This leads to the hypothesis that an individ- 
ual cyclone may be distinguished as a recurver (straight-mover) if the EOF coefficients 
for that cyclone are closer to the mean of the cluster associated with recurvers 
(straight-movers). The questions are how far in advance of recurvature can these dif- 
ferences in EOF coefficients be detected and with what time accuracy. 


B. MODEL DEVELOPMENT 

To test the hypothesis that individual cases may be classified according to the 
closeness of their EOF coefficients to the mean values for the recurver and straight sets, 
a classification model is developed using the Euclidean distance method. The Euclidean 
distance (D) is calculated in multidimensional EOF space using the formula 


D= \/(a(a) — B(a)) +... + (a(i) — HY’, (3.1) 
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Fig. 4. Time progression of the first and second EOF coefficients for 1984 
recurvers. Markers indicate the values of the first (x-axis) and second (y-axis) EOF 
coefficients of the 700 mb vorticity fields for all 12-hourly cases analyzed by 
Gunzelman (1990), Values at recurvature time are circled and arrow heads mark the 
last case in each storm sequence (see legend for storm number), The start and end 
aA oe anes for ST Vanessa (storm number 25 during 1984 is denoted 2584) are 
abeled. 


where @ is the EOF coefficient for the case, & is the mean of the EOF coefficients for the 
forecast classification group, and the indices a through / represent the EOF modes used 
as predictors. Separate distances are calculated relative to the mean EOF value of each 
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Fig. 5. Time progression of the first and second EOF coefficients for 1984 
straight-movers and odd-movers. As in Fig. 4, except for 1984 straight-moving 
storms (left) and odd-moving storms (right). 


potential classification group, and ther the classification is into the group witi. the 
sinallest distance. 
1. Forecast group means 

Two issues in the development of this simple model are the selection of the 
representative rccurver and straight-mover cases to calculate the classification group 
means, and the specification of the set of EOF modes that best distinguishes between the 
recurver and straight-mover situations. A “clean” set of 15 recurving and 15 straight- 
moving storms is selected from the 1979-1984 data set in hopes of identifying the most 
representative vorticity patterns for the classification categories. The following criteria 
are used to select the clean sets: 


© a tropical cyclone attaining a least typhoon strength (maximum sustained winds 
of 33 ms-! (65 kts) or greater); 


¢ formation east of 130° E; and 


® atypical recurver or straight track exhibiting no significant deviations. 
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Tracks for the clean svt storms are shown in Fig. 6. Because the clean storms 
exhibit typical recurver- or straight-track motion, the EOF coefficients for these storms 
should be representative of the typical vorticity patterns associated with recurver- or 
straight-track motion. 

The mean values of the first 45 time-dependent EOF coefficients are computed 
from the clean recurver set at times R-00h through R-96h and from the clean straight- 
mover set that is labeled NR. As in the time progressions of the first two EOF coeffi- 
cients for the 1984 storms in Figs. 4 and 5, considerable variability exists around the 
12-hourly mean coefficients even in these clean set storms (not shown). To obtain more 
representative transitions among the time-to-recurvature groups in EOF space, a run- 
ning mean value is calculated from three times centered on the desired time. For ex- 
ample, the mean for recurvature time R is calculated from the EOF coefficients at 
R-12h, R-00h and R+12h. The NR group averages also are calculated from three 
consecutive 12-hourly cases. These cases are selected so that the average longitude of 
the clean set straight-mover cases (132.U6° E) is close to the average longitude of the 
clean set recurvers at recurvature time (130.99° E), Although only straight-mover data 
are used to define the PRNR classification group, the Euclidean distance approach 
should distinguish straight-moving cases (NR) as well as recurving storm cases that are 
more than 96 h before recurvature (PR). 

Vorticity fields at each pressure level (700, 400 and 250 mb) reconstructed from 
the mean EOF coefficients for each classification group (Figs. 7, 8 and 9) illustrate the 
evolution of the synoptic patterns associated with recurvature. These patterns are sim- 
ilar at all three levels. The sequence starts with the NR pattern in which the subtropical 
ridge is well defined by the broad anticyclonic (negative) vorticity center to the north of 
the cyclone center. Such a pattern would be expected to produce westerly or 
northwesterly storm motion and a straight-type track. At R-96h, the anticyclonic 
vorticity associated with the subtropical ridge is weaker to the north and stronger to the 
northeast of the storm center than it was in the NR pattern. Proceeding toward recur- 
vature time, the cyclonic (positive) vorticity associated with the storm and the 
anticyclonic vorticity associated with the subtropical ridge increase in magnitude as the 
composite "clean-set storm” moves north-northwest around the ridge. At recurvature 
time, the storm center position is at the axis of the ridge and only a relatively weak re- 
gion of anticyclonic vorticity is found between the storm and the midlatitude cyclonic 
vorticity to the north, The differences among the recurvature patterns at the threc 
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Fig. 6. Clean sets of recurvers and straight-movers for the Euclidean distance 
approach, JTWC best tracks for the 15 recurving (top) aud 15 straight-moving 
(bottom) clean set storms during 1979-1984. These storms are used to calculate the 
mean time-dependent EOF coefficients that identify the recurvers and the straight- 
movers for the Euclidean approach. 
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Fig. 7. Reconstructed 700 mb vorticity ficlds for clean set composites, Relative 
vorticity contours (1078s) are reconstructed from the means of the first 45 EOF 
modes for the clean set storms at R-00h (top left), R-2dh (top right), R-48h (middle 
left), R-72h (middle right), R-96h (bottom left), and NR (bottom right). Positive 
(negative) values are solid (dashed). North latitude is along the y-axis and east 
longitude is along the x-axis. The black dot indicates the storm center position. 


pressure levels are similar to the relative vorticity differences with height noted by 
Gunzelman (1990), as described in Section I1.A.1. 
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Fig. 8. Reconstructed 400 mb vorticity fields for clean set composites. Time in- 
tervals and contours are similar to Fig. 7. 
2. Predictor modes 


The objective is to select the set of EOF predictors that best separates the 
time-to-recurvature and PRNR classification groups, as defined by the clean sct mean 
values in multidimensional space. Since the Euclidean distance approach offers no 
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Fig. 9. Reconstructed 250 mb vorticity ficlds for clean set composites. Time in- 
tervals and contours are similar to Fig. 7. 


objective sclection criteria, such as the F-to-enter and other statistics in regression and 
discriminant analysis packages, the final choice of predictors will be based on model! 
classification skill. Although the initial tests are conducted for all three pressure levels, 
the Euclidean model development is presented here for 700 mb data only. 
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Potential predictors are first screened by the ability to distinguish the clean set 
recurving cases from the straight-moving cases (NR) at each 12-h time (R-00h through 
R-96h). In this procedure, the clean set cases in one time-to-recurvature group (plus or 
minus 12 h) and all straight-moving storm cases are classified as recurvers (straight- 
movers) if the Euclidean distance is closer to the clean set mean values for the 
time-to-recurvature group (PRNR group). The skill in identifying the storm type is 
expressed as the percent correctly classified. 

To illustrate the importance of the choice of the predictor set modes, clean set 
classifications into recurvers versus straight-movers using 200 randomly selected sets of 
ten EOF modes are compared in Fig. 10. One hundred sets of ten modes are selected 
randomly from the first 45 EOF modes (top) and 100 sets are formed from EOF Mode 
1 plus nine other randomly selected modes (bottom). The combined skill in distin- 
guishing between recurving and straight-moving storms is better than 50% for all times 
before recurvature for all random sets. The highest skill is achieved when EOF Mode 1 
is forced and ranges from 95% at R-00h to about 80% at R-48h to R-96h. However, 
the skill among the random sets varies by as much as 40 percentage points. In the tests 
with ten randomly selected predictors (top, Fig. 10), notably better classification skill in 
the R-00h through R-36h groups also is achieved when Mode | is included. These re- 
sults illustrate the importance of Mode 1 in distinguishing recurving storm vorticity 
ficlds near recurvature time. However, the remaining EOF predictors are necessary to 
discern the R-48h through R-96h recurving storm vorticity ficlds from the straight- 
mover fields. The problem is how to determine the optimum set of predictors without 
having to evaluate all possible permutations of the first 45 EOF modes. 

Since the optimum set of predictors must be able to distinguish recurver and 
straight vorticity fields at all 12-h time steps before recurvature, the set should consist 
of some combination of the modes that best distinguish at each of the individual times 
before recurvature. Thus, recurver versus straight-mover classifications are evaluated for 
each of the 45 EOF modes separately for each 12-h time group. For each time-to- 
recurvature group, the first 45 modes are ranked as potential predictors in the order of 
their individual skill. Then a prototype set of predictors is formed from the two predic- 
tors with the highest individual skill. If the skill for this set is greater than when only 
the highest individual predictor is included, the second predictor is retained in the set. 
This stepwise process is continued by including the individual predictor with the next 
highest skill until the 45th best EOF mode is evaluated. In each step, the new predictor 
is only retained if the percent correct classifications is increased over the previous step. 
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Fig. 10, Euclidean method classification skill into recurvers and straight-movers 
using randomly selected EOF predictors, Classification skill for the clean set cases 
using 100 sets of ten randomly selected EOF modes (top) and using 100 sets of EOF 
Mode I plus nine other randomly selected modes. Clean set cases in each time-to- 
recurvature group (R-00h through R-96h) (abscissa) are distinguished from the ciean 
set straight-mover cases (NR). The percent correct classifications (ordinate) is for 
both the recurving and straight-moving storm cases, 
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Surprisingly, only the EOF Mode | is included in the 700 mb set for the R-00h 
group using this stepwise screening process. That is, no other mode increases the clas- 
sification skill relative to EOF Mode 1 alone. However, the skill of this Mode } 
Euclidean model in distinguishing clean set recurver and straight-mover cases (top, Fig. 
11) rapidly declines from 95% for R-00h to 75% for R-36h and only 50% at R-96h. 
This result is consistent with Fig. 10, and indicates that Mode 1 alone is not adequate 
for distinguishing recurvers versus straight-movers at other times prior to recurvature. 
When the stepwise addition of predictors is applied at each of these times, multiple 
modes are selected. The skill of these sets of predictors to distinguish recurver and 
straight storm cases (bottom, Fig. 11) ranges from 80-95%, Consequently, this result 
indicates the optimum performance of an Euclidean model with the dependent set of 
clean storms, In practice, the time to recurvature is unknown and the forecaster would 
not know which of these sets for individual times would apply. The objective is then to 
select a set of predictors that can be applied at all times, but does not degrade too se- 
verely from the optimum performance at the individual times shown in Fig. 11. 

Potential overall best sets are formed from the EOF modes included in the sep- 
arate sets determined for each time step in Fig. 11 plus other time-step scts. Two addi- 
tional time-step sets are formed using the less restrictive selection criteria that inclusion 
of a specific EOF mode does not change (degrade) skill, In another time-step set se- 
lection approach, the EOF modes simply are entered in numerical order, rather than 
according to their relative skill in discerning storm type. Using the lower mode EOF 
coefficients, which are less likely to contain noise than the higher modes and are related 
to larger scale features in the vorticity fields, may provide more reliable separation 
among the classification groups. A summary of these five selection criteria for the 
time-step sets is given in Table 6. Since each of these selection criteria leads to the 
inclusion of different EOF modes in the Euclidean method for the time-step groups, no 
consensus is evident for use in forming the overall best sets. 

Various subjective criteria involving the number of times an EOF mode is se- 
lected for one of the individual time-step sets are tested to form an overall best set. For 
the collection of R-00h through R-96h predictor sets selected using one of the criteria 
A through E in Table 6, a mode may be required to appear in a certain number of these 
individual sets to be included in a potential cverall best set. Each potential overall best 
set of predictors is evaluated by scoring the Euclidean distance classifications into the 
12-h time-to-recurvature groups (R-00h through R-96h) plus PRNR. Classification 
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Fig. 11. Euclidean method classification skill using time-step sets of EOF predic- 
tors. Classification skill as in Fig. 10, except for only the EOF Mode 1 (top) and 
for separate scts of EOF predictors at each time-to-recurvature group (bottom). 
Recurvers (dotted), straight-movers (dashed) and the total correctly classified in 
both storm categories (solid) are indicated. 
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matrix scores for the dependent clean set (161 cases) and for the learning set cases not 
belonging to any of the clean set storms (458) are presented in Table 7. 


Table 6. EOF MODE SELECTION CRITERIA FOR EUCLIDEAN METHOD 
TIME-STEP PREDICTOR SETS: Scts are formed from the stepwise 
selection of the first 45 EOF modes in the order (column 2) of their indi- 
vidual skill in distinguishing between clean set recurvers and straight- 
movers (predictability) and or simply in numerical order. A mode is 
selected (column 3) if the new set skill is greater than (GT), or greater than 
or equal to (GE) the skill before the addition of that mode. This stepwise 
process is continued until all 45 modes are tested. Then, the total number 
of predictors retained in the set is limited to the number specified in col- 
umn 4, “NONE” indicates that no restriction is placed on the total num- 
ber of predictors that may be retained in the set. “10 (MIN)” indicates 
that only the minimum number of modes required to achieve the same skill 
as the first ten modes selected in the stepwise selection process are ulti- 
mately retained in the time-step set. 


SELECTION SELECTION MODES RETAINED IF NEW 
CRITERIA ORDER SET SKILL IS GE OR GT SEFORE LIMIT 


PREDICTABILITY 


PREDICTABILITY 

PREDICTABILITY 
NUNERICAL 
NUMERICAL 





The stability of the Euclidean model is judged first by comparing the D-score 
for the independent and dependent samples. This D-score evaluates only the categorics 
R-00h through R-96h plus NR that comprise the dependent sample. As expected, skill 
is best (D-score = 1,78-2.07) for the dependent set classifications. The degradation in 
the D-scores for the independent sample, which range from 2.56 to 2.73, is not linear. 
For example, the second-best score for the independent sample (2.57) is for 4 model that 
has the worst D-score (2.07) for the dependent sample. In addition, the model with the 
best D-score (1.78) in the dependent sample has one of the worst D-scores with the in- 
dependent sample. Notice that higher skill is attained when the selection of the EOF 
predictors is according to their relative predictability (lines 1-6) than if selection is simply 
in numerical order (lines 7-9). However, the selection of such a large number of EOF 
predictors, and especially the selection of such high order modes as 41 and 42, is a 
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Table 7. CLASSIFICATION SKILL FOR TWO METHODS OF SELECTING 
EUCLIDEAN MODEL PREDICTORS: 700 mb classification skill in 
terms of D- and I-scores for the independent set forecasts (first two col- 
umns) and D-scores for the dependent (clean set) forecasts. Predictor 
EOF modes in lines 1-9 are from various subjective combinations of the 
sets of predictors that separate the individual time-to-recurvature groups 
(R-00h through R-96h) from the PRNR group. Selection criteria for these 
time-step sets (column 4) are explained in Table 6, and the number in pa- 
rentheses indicates the number of R-00h through R-96h sets in which an 
individual mode must have appeared to be retained in the potential overall 
best set. R-O0h (lines 10 and 11) time-step sets are also evaluated as 
Euclidean model predictors. 


INDEPENDENT; DEPENDENT: TIME-~STEP SET 
D-SCORE I-SCORE O-SCORE SELECTION CRITERIA MODES: 


SETS OF MODES COMION TO R-OOH THROUGH R-96 SETS: 


1.92 4 10 30 31 34 
84 4 810 20 23 25 30 3) 34 37 38 41 42 
4 20 28 30 31 34 37 42 42 
20 25 30 31 34 36 41 42 
13 18 


7 8 910 32 
7 


10 0 60 Ot a am ED A ab te 


R-OOH SETS: 


2.85 
2.48 


concern. It may be that the dependent set is being well described, but this is at the 
expense of degradation in the independent sample performance. 

Another subjective, but physically based, approach can be used in the selection 
of predictors for the Euclidean method, Recall that the time-to-recurvature coefficients 
in Fig. 4 trace out a smooth path in the EOF 1 - EOF 2 domain. Since these coefficients 
are in time order, increasing the geometric distance between the beginning and end time 
coeflicients should also increase the distances between the intermediate time values. 
Thus, the hypothesis is that the set of predictors that best distinguishes between the 
R-00h and NR clean set cases may also be best for identifying the intermediate 12-h 
time-to-recurvature cases. To test this hypothesis, the R-OOh sets formed using the 
stepwise sclection criterion A and C in Table 6 are evaluated as overall Euclidean model 
predictor sets (bottom group, Table 7). Both sets are selected from the first 45 EOF 
niodes in order of their individual predictability. As indicated above, only EOF Mode 
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1 enters the 700 mb model if selection criteria A is applied. If the mode retention crite- 
rion is relaxed to just greater or equal (selection cviteria C), seven additional modes are 
included in the set. These additional modes significantly improve dependent sample 
classification skill (D-score = 1.87 versus 2.27 obtained using Mode 1 alone). Rated 
on the D-score performance, independent sample classification skill is also higher (2.48 
versus 2.55 for Mode 1 alone). Surprisingly, the I-score for the independent sample 
classifications is slightly less (2.41 versus 2.40) for Mode 1 alone. This indicates that the 
independent sample PR cases, included only in the I-score, must be well classified using 
EOF Mode 1 alone, Both Euclidean models based on the R-OOh sets demonstrate higher 
skill in classifying the independent sample (D-score = 2.48-2.55 and I-score = 
2.40-2.41) than those models based on the predictors common to all R-00h through R-96 
sets (D-score = 2.56 - 2.73 and I-score = 2.52-2.60). Therefore, the conclusion from 
these tests is that the EOF modes that best distinguish between the R-00h and straight- 
mover cases also provide the best Euclidean model skill in identifying the correct 12-h 
time-to-recurvature (R-00h through R-96h) or non-recurvature (PRNR) forecast group. 

Based on the above conclusions, the search for an overall best set of predictors 
for the Euclidean model is confined to those sets that provide the best distinction be- 
tween the clean set R-00h and straight-mover cases. Since the problem is reduced to the 
separation of only two categories of data, univariate hypothesis testing can be used to 
identify the modes with the greatest difference between R-00h and NR mean values, 
Individually these modes provide the greatest separation between the R-O0h and PRNR 
groups in one-dimensional space. Therefore, some combinations of these modes also 
should provide the best separation of the R-00h and PRNR classification groups in 
multidimensional EOF space. An EOF mode is identified as having significantly differ- 
ent R-00h and NR means if the p-value for a two sample t-test of the clean set R-00h 
and NR coefficients for that EOF mode is less than or equal to 0.01. Since the p-value 
is the smallest significance value at which the null hypothesis (that the R-O0h and NR 
means are equal) can be rejected, this test objectively identifies the modes with the 
greatest separation between the R-00h and NR means (Fig. 12). As expected, the largest 
difference between the mean EOF values for the R-00h and NR groups at 700 mb is for 
Mode 1. Ten other modes also have significant differences in mean EOF values ac- 
cording to this test. Overall predictor sets are then cho:en from among these significant 
modes using the stepwise sclection criteria described in Table 8. 

Euclidean model skill in identifying recurvers and straight-movers is compared 
in Fig, 13 for R-0Oh predictors selected from only the significant modes (top) and for the 
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Table 8. SIGNIFICANT MODE SELECTION CRITERIA FOR R-00H PRE- 
DICTOR SETS: Criteria as in Table 6, except applied only to those 
modes identified by a two sample t-test as having significantly (p-value < 
0.01) different R-OOh and NR mean coefficient values. 


SELECTION SELECTION MODES RETAINED IF NEM 
ORDER SET SKILL IS GE OR OT BEFORE LIMIT 


CRITERIA 


NUMERICAL 

NUMERICAL 

NUNERTCAL 
PREDICTABILITY 
PREDICTABILITY 





straight-movers than the Mode 1 model in Fig. 11 (top), and less skill in the R-36h 
through R-96h periods for the optimum time-step model in Fig 11 (bottom), One ad- 
vantage of the model based on the significant modes is that the separate levels of skill 
for recurvers and straight-movers a‘e more consistent. By contrast, the nearly equal 
combined skill for the numerical EC F mode model is gained by much better skill for 
straight-movers than for recurvers. 

The final step in the Euclidean distance model development is then to evaluate 
the R-00h predictor sets (F through J in Table 8) and identify the set and pressure level 
with the highest time-to-recurvature classification skill. The classification matrix scores 
for the independent and dependent sample classifications for the EOF modes selected 
on the basis of hypothesis tests are presented in Table 9. 

Even though the number of EOF modes is limited by the significance testing, 
the selection criteria in Table 8 can lead to different Euclidean models. Except for the 
700 mb Mode 1 model, the largest sets of predictors are selected at 700 mb (6-8 modes) 
and 400 mb (4-8 modes), The 250 mb model using only two or five modes demonstrate 
the best skill in identifying the 12-h time-to-recurvature groups in the dependent sample 
(250 mb Descore = 1.81-1,83, 400 mb D-score = 1,81-1.92 and 700 mb D-score = 
1,93-2.27). As noted previously for the Euclidean models in Table 7, the degradation in 
the D-score for the independent samples typically is not linear. For the independent 
sample, the skill for 250 mb (D-score = 2.43-2.45 and I-score ™ 2.40-2.45) and 700 mb 
(D-score * 2.44-2.55 and I-score * 2.36-2.40) are nearly comparable. Less skill is 
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Fig. 13. Euclidean method classification skill using R-00h scts of EOF 
predictors. Classification skill for 700 mb as in Fig. 10, except for a R-OOh set 
chosen (selection criteria I in Table 8) from only those modes identified with signif- 
icantly different R-00h and NR clean set EOF mean coefficient values according to 
a two sample t-test (top), and for a R-OUVh set of a similar number (eight) of modes 
sclected (similar to critcria D in Table 6, except limited to eight vice ten predictors) 
from all 45 EOF modes (bottom). 











Table 9. CLASSIFICATION SKILL FOR EUCLIDEAN MODELS AT THREE 
PRESSURE LEVELS: Classification skill as in Table 7 for the five se- 
lection criteria F through J (described in Table 8) of EOF mode predictors 
at each pressure level with significantly different (two sample t-test p-value 
< 0.01) clean set R-00h and NR mean EOF coefficient values, 
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noted at 400 mb (D-score = 2.56-2.67 and I-score = 2.49-2.62). These sets of Euclidean 
model predictors identified by significance testing tend to outperform the R-00h sets se- 
lected (criteria A through E in Table 6) from all 45 Modes 1-45 (not shown), unless by 
chance they contain the same modes. 

Judged on the independent sample classification matrix D-scores, the best 
Euclidean distance model using the 250 mb vorticity includes EOF Modes 1, 6, 10, 12 
and 15 (lines 12 and 14). Two advantages of this sct are that only five predictor variables 
are required and no EOF mode greater than 15 is included. By contrast, the best 700 
mb set selected using criteria I has eight EOF modes, and includes higher order modes 
such as 31, 34 and 39, 


C. MODEL EVALUATION 

The final Euclidean model at 250 mb is evaluated in terms of skill in classifying the 
learning set of 782 cases (Table 10). The combined skill in correctly identifying recurvers 
(75%) and straight-movers (68%) during the 72-h forecast period is 71%. This com- 
pares with %R, %S and %T scores of 36, 64 and 54 for climatology (Table 5). Skill in 














identifying the time to recurvature is best near recurvature (R-00h = 45%, R-12h = 
21% and R-2dh = 35%), and in the straight-track categories (R-96h = 38% and PRNR 
= 35%). The higher skill at the ends of the forecast interval may be because the EOF 
predictor modes were selected to achieve maximum separation of the R-00h and the NR 
mean EOF coefficients. In addition, there may be more variability in the vorticity fields 
as recurvature conditions develop (R-36h through R-84h). 


Table 10. CLASSIFICATION MATRIX FOR’ FINAL EUCLIDEAN 
MODEL: Classifications for observations in each 12-h verification cat- 
egory and the percent correctly forecast by the 250 mb Euclidean model. 
Percent of recurvers and straight-movers correctly predicted is also listed. 


MODES: 1 6 10 12 15 


CLASSIFICATION 
36 48 60 7 


VERIFY CORRECT 00 


RECURVER: (45%) 
(752) (21%) 
(35%) 

(17%) 

(22%) 

(124) 

(13%) 


STRAIGHT: R-84H (10%) 6 10 
(684) R-96H (387) 1 4&4 2 56 0 9 
PRHR (352) 5 16 24 26 23 22 17 120 135 


cen me PD 8 S&H OP hh HE OD HO Hm Om HEA OR DOD ON WAY GD HO) GP Ot ED OO GOH OO HD AED OE OD wb HE MND Oi HRY OD A OD Ory 


TOTAL (71%) 


SON WBIN DI 


Bar charts (Fig. 14) of the percent of learning set cases in each 12-h verification 
category that are classified into each time-to-recurvature group further confirm the rel- 
atively poor ability of the Euclidean method for the R-84h through R-36h cases. The 
intermediate 12-h categories not shown in Fig. 14 tend to have similar characteristics as 
the 24-h bar charts. Times near recurvature and in the straight-track categories are 
better classified and are also more likely to be classified within only one or two classi- 
fication groups of the correct value. Cases in the intermediate forecast intervals (R-36h 
through R-84h) are more likely to be misclassified, and the classification errors, in terms 
of the number of 12-h categories between the forecast and the verification groups, are 
greater, 

The Euclidean model classifications presented above have higher skill than the 
Climatological forecasts of the learning sct cases (Table 5), For example, the I- and 
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Fig. 14. Classification bar charts at 24-h intervals for the Euclidean 
model. Percent of N cases (ordinate) verifying as R-0Oh (top left), R-24h (top 
right), R-48h (middle left), R-72h (middle right), R-96h (bottom left), and PRNR 
(bottom right) that are classified into each group R-00h through PRNR (abscissa). 
Shaded bars indicate the percent in the correctly classified category. 
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R-scores of these Euclidean model 12-h forecasts are 2.34 and 2.10 versus 3.93 and 5.30 
for the climatological forecasts, respectfully. However, the skill in identifying the time 
to recurvature is less than desired for operational use. Because of the subjectivity in the 
development of the Euclidean model, these results should not be used to make final 
conclusions regarding the usefulness of an EOF representation of vorticity to forecast 
tropical cyclone recurvature. No definitive method was found for selecting the optimum 
set of EOF predictors in the Euclidean method. In addition, each EOF mode that is 
selected is given the same weighting, rather than assigning additional influence to the 
modes that have the most significance. Furthermore, the use of a small clean set of 
storms in this approach may not provide the most robust definition of the time-to- 
recurvature classification groups, Nevertheless, the Euclidean method has easily under- 
stood physical interpretation for using an EOF approach in identifying the vorticity 
patterns associated with recurvature. The above results indicate the approximate levels 
of skill that can be expected using these predictors. However, a more objective approach 
is needed to identify the optimum set of EOF predictors and to better exploit the relative 
contributions of each mode in the recurvature forecast model. 
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IV. DISCRIMINANT ANALYSIS APPROACH 


The approach in this section is to use discriminant analysis techniques to better ex- 
ploit the predictive skill of EOF coefficients of vorticity in forecasting tropical cyclone 
recurvature. The UCLA Biomedical Computer Program BMDP7M (Dixon 1988) is used 
to select the predictors and develop the discriminant analysis model. Although 
discriminant analysis is a seemingly more objective approach than the Euclidean distance 
method, the user must still make many choices both in its application and evaluation. 
Searching for the optimum discriminant analysis model requires extensive testing and 
should be conducted on a much larger sample population. Thus, the goal of this study 
is to isolate a justifiable prediction model that indicates the potential of this method. 


A. DISCRIMINANT ANALYSIS 

Discriminant analysis is a statistical procedure for identifying the boundaries be- 
tween groups in terms of the variable characteristics that distinguish one group from 
another. It is used to classify cases into one of several groups and to examine the rela- 
tive contributions of one or more variables in Gistinguishing between groups. The pro- 
cedure was first introduced by Fisher (1936). The Fisher discriminant function has the 
form 


Z = aX; + ayXy tu + ayXyy (4.1) 


where Z is the discriminant score, 1), X3,...4, are the values of each predictor and 
Q, %, a, are coefficients that, if standardized by pooled standard deviations, give an 
indication of the relative weight of each predictor. Discriminant functions are derived 
such that the differences in discriminant scores or the relative distances between groups 
are maximized. The first function separates the members of the most distinguishable 
group, &,, from the remainder of the groups, K, through K,. The second discriminant 
function separates the next most recognizable group, K,, from the remaining groups, K; 
through A,. The number of functions required is one less than the total number of 
groups, g. For each discriminant function, a cutoff score is found by taking the mean 
of the average score for all cases in the group A; and the average score for ail cases in 
the remainder of the groups K,., through K;. An individual case is classified into group 
K, if its discriminant score Z, is greater than the cutoff score for the first discriminant 
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function. If the discriminant score is less than the cutoff, a second discriminant score 
is calculated using the second discriminant function. The second discriiainant score is 
compared to a second cutoff score to determine if the case is in group K, or the re- 
maining groups K; through K,. The process continues until the case is classified. 

A simpler adaptation of Fishcc’s classification procedure is used in statistical pack- 
ages such as BMPD7M (Klecka 1980 and Dixon 1988). A classification function for 
each group is derived as a linear combination of coefficients and predictors plus a con- 
stant term. Predictors can be specified or they can be selected in a stepwise fashion 
based on user-specified criteria. To determine group membership, each function is 
evaluated using the predictor values of the test case to obtain a classification function 
score for each group. The case is classified into the group for which it has the highest 
classification score. 

Classification function coefficients cannot be standardized and interpreted in the 
same manner as discriminant function coefficients because there is a different function 
for each group. However, discriminant functions can be computed from classification 
functions to examine the relationship between predictors and group classification (Afifi 
and Clark 1984). More commonly, statistics derived from canonical correlation analysis 
techniques are used for this purpose. Canonical correlation analysis examines the linear 
relationship between independent variables (predictors) and one or more sets of de- 
pendent variables (groups). A linear combination of predictors called a canonical vari- 
able or canonical discriminant function is formed that provides the best separation 
among groups. Second and subsequent canonical discriminant functions are then 
formed that are orthogonal and best separate the groups on the basis of associations not 
used in the preceding canonical discriminant functions. The maximum number of 
canonical discriminant functions is equal to the number of groups minus one or the 
number of predictor variables, whichever is less. Canonical discriminant functions can 
also be used to classify. Final classifications will generally be identical to those obtained 
with classification functions unless the group covariance matrices are not equal (Klecka 
1980). A complete discussion of the application of canonical correlation statistics to 
discriminant analysis can be found in Klecka (1980) or Afifi and Clark (1984), 


B. MODEL ISSUES 
Several issucs basic to the development of a discriminant analysis model are con- 
sidered in this section. These issues include the selection of a dependent sample, how far 
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in advance recurvature can be recognized, and the optimum number and composition 
of the classification groups. Decisions in these areas are based on the ability of EOF 
modes to predict recurvature as well as the classification goals of the forecast model. 
These decisions, in combination with choices in the application of the discriminant 
analysis method, will affect the level of classification skill that can be achieved with a 
given set of predictors and predictands. Only 250 mb data are considered in this section, 
because the data for this level provided the best discriminating power in the Euclidean 
distance approach and in comparative tests (not shown) using discriminant analysis. 
1. Dependent Sample Selection 

Ideally, the sample population should be divided into dependent and independ- 
ent subsets to permit validation of the discriminant analysis classification model. Clas- 
sification functions may be fit well to a small dependent sample, but not be effective in 
predicting an independent sample. Independent testing is thus necessary to better esti- 
mate the ability to correctly predict the total population. Opinions vary on the appro- 
priate sizes of the subsets. However, the dependent subset must be sufficiently large to 
ensure the stability of the classification function coefficients (Klecka 1980). 

Several aspects of the discriminant analysis must be specified to test the effect 
of the dependent subset options. As a first test, the classification groups will be the same 
ten categories as in the Euclidean distance approach: recurvature time to recurvature 
time minus 96 h in 12-h increments plus the non-recurvers. Although only straight- 
mover storm data are used to describe the non-recurver group while developing the 
discriminant analysis model, later tests will consider the observations more than 96-h 
prior to recurvature as part of the straight-mover set. Classification functions are de- 
rived from predictors selected in a stepwise fashion using a common F-to-enter value of 
2.5. Although dependent subsets vary in size from 158 to 510 cases, this F-to-enter value 
is significant at better than the 99th percentile for all subsets. Therefore, differences in 
predictors selected in the discriminant analysis procedure can be mainly attributed to 
Statistical differences among the dependent subsets. The classification functions then 
are used to classify both the dependent subset and the remaining independent cases in- 
cluding pre-recurver cases. Classification matrix scores (described in Section 11.C.2), are 
computed for dependent and independent subset classifications separately and for the 
entire sample classifications. 

The purpose of the intercomparison of the classification models derived from 
13 different dependent subsets of the 250 mb sample (Table 11) is to test the stability 
of the classification functions, Whole-storm data from the same set of clean recurving 
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and straight-moving storms in the Euclidean distance approach are used to form the first 
dependent subset. Since EOF coefficients tend to progress in a similar manner as storms 
approach recurvature, analyzing a subset comprised of entire storms may lend statistical 
stability to the analysis. Two other whole-storm dependent subsets are formed from all 
1979 to 1982 storms and from a random selection of two-thirds of the storms in the 
sample population. To test the stability of these classification functions, ten dependent 
subsets are formed by random selection of two-thirds of the cases in the sample 
population. 


Table 11. DEPENDENT SUBSET SELECTION: Stepwise discriminant analysis 
of the times to recurvature for 13 dependent/independent subsets of 250 
mb vorticity EOF .:oefficients, which are indicated in the order they were 
selected, 


INDEPENOENT DEPENDENT  COtSINED DEPENDENT PREDICTORS 

O0-SCORE D-SCORE Z-SCORE SUBSET N (EOF MODES): 
2.86 2.59 CLEAN STORMS 158 
2.35 2.40 79-82 STORHS 510 536 6 45 
2.62 1.99 2.37 RANDOM STORMS 449 641 7 6 
2.47 2.37 2.41 RANDOM CASES 1 1 
2.27 2.20 2.23 RANDOH CASES 2 
2.32 2.08 2.15 RANDOM CASES 3 
2.49 2.35 2.36 RANDOM CASES 4 
2.50 2.08 2.25 RANIDOH! CASES 5 
2.66 2.48 2.50 RANDOM CASES 6 
2.38 2.24 2.32 RANDOM CASES 7 
2.33 2.23 2.22 RANDOM CASES 8 
2.11 2.15 2.10 RANIDOM CASES 9 
2.12 2.12 2.06 RANDOM CASES10 


0 0 be Ot be Be be be be 


The results in Table 11 reflect differences due to the independent sample com- 
position as well as to the dependent sample composition. If the classification functions 
were very stable, the various methods of subsampling in Table 11 should have involved 
the same predictors and have nearly equivalent dependent-independent verification 
scores. In practice, predictors vary in number, modes and in the order selected. This 
order is not necessarily indicative of their relative importance because a strong 
discriminator may be selected late or not at all in a stepwise analysis if the intercorre- 
lation with other variables reduces its unique contribution to the analysis. Mode 1 EOF 
coefficient is the only predictor selected for all 13 dependent subsets tested. Modes 5, 2 
and 3 are selected in 12, 11 and 10 of the subsets respectively. Since Mode 6 appears in 
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eight of the subsets, it is potentially important in the discriminant analysis. Notice that 
Mode 4 is selected only once, and that many higher order modes are selected after Mode 
6. 

Classification matrix scores also vary. Classification functions derived for the 
clean storm sample in Table 11 demonstrate a high degree of skill in classifying the de- 
pendent subset cases, but perform poorly in classifying independent subset cases. The 
combined classification matrix score for this subset pair is much worse than for other 
pairs, which indicates that the model is well-fitted to the dependent subset only and is 
not accurate on an independent subset. This result may suggest a flaw in the use of the 
clean storm set for the Euclidean method, where the excellent distinction in the depend- 
ent set was not sustained in the remaining cases, 

The independent test results can be overly optimistic if the subset contains a 
disproportionate number of cases that are statistically easy to classify. For example, the 
classification functions derived from 1979-1982 storm data demonstrate better skill in 
classifying the independent subset (1983-1984) than the dependent subset. This unex- 
pected result can be explained by examining the differences in the storms between the 
two subsets. Patrick Harr (personal communication) found that western North Pacific 
tropical cyclones during 1983 and 1984 had recurvature tracks that were similar to 
climatology and were relatively easy to forecast in comparison to those in the previous 
four years. 

The 10 random subsets in Table 11 were generated to test whether the classi- 
fication functions derived from dependent subset predictors would work equally well on 
the independent subset. In other words, the randomly selected cases should have nearly 
equal classification matrix scores. Only the two subset pairs formed from the ninth and 
tenth randomly selected cases have nearly equal dependent and independent classifica- 
tion matrix scores. Because the classification model derived from randomly selected 
storms (line 3 in Table 11) outperforms those derived from randomly selected cases in 
dependent subset classification, retaining data from entire storms in the dependent set 
may aid in the derivation of skillful classification functions. However, the marked de- 
gradation in the independent D-score for the random storm independent subset indicates 
that the differences in skill are also a function of which subset contains more storms that 
are inherently easier to classify. 

The conclusion from Table 11 is that classification functions derived from ran- 
domly sclected subsamples of this data set are not statistically stable. To improve the 
stability, the entire sample population will later be uscd to both derive and test 
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classification functions. In lieu of independent testing, jackknifing is employed to assess 
the degradation in classification skill expected in the total population. In this procedure, 
N sets of classification functions are derived by successively withholding one case from 
the sample of N cases. Each of the N sets of classification functions is tested on the one 
case that was withheld, and the summation of these verifications is an indication of the 
likely accuracy of a single discriminant analysis based on the entire sample. Although 
jackknifed results are computed for each discriminant analysis, they will be presented 
only in the selection and testing of an optimal classification model for this sample of 
storms (Sections IV.D and IV.E.1). 
2. Limits of discrimination for time to recurvature 

A basic question is the limitation of the discriminant analysis to separate the 
EOF coeflicients associated with storms more than 96 h before recurvature from the 
straight-mover coefficients. To illustrate this limitation, univariate statistics for EOF 
Mode 1 coefficients are compared. Mode 1 not only accounts for the largest percent of 
the variance in the synoptic vorticity patterns, but also demonstrates the greatest pre- 
dictive capability. It is the only predictor consistently selected and is selected first in all! 
subset analyses (Table 11). 

Distributions of the Mode I means, 95% confidence intervals and the standard 
deviations for times R-00h through R-96h in 12-h increments, plus the pre-recurvers 
(PR) and non-recurvers (NR) in the entire 250 mb data set are presented in Fig. 15. 
Univariate statistics for a combined PR and NR group are also plotted. Group statistics 
can be interpreted in terms of the physical processes they represent. Recall that the 
pattern for Mode | (Fig. 2) is representative of the vorticity pattern associated with a 
storm in the monsoon trough and that the magnitude of the coefficient is indicative of 
the importance of the pattern (or the opposite pattern if it is negative), Group means 
vary almost linearly from large positive values for non-recurver and pre-recurver situ- 
ations to large negative values at recurvature. Variances are large and the considerable 
overlap among groups indicates the variability in vorticity patterns that lead to recur- 
vature. These group means are most separated in the 36 h preceding recurvature. 
However, variances are also largest during these times. These large differences are as- 
sociated with rapid changes in the storm-centered vorticity patterns accompanying 
storms moving around the subtropical ridge. In contrast, vorticity patterns change little 
for storms moving along the monsoon trough well prior to recurvature. 

The challenge for the discriminant analysis (or the Euclidean method) is to dis- 
tinguish those EOF modes that best indicate the time to recurvature. With the similar 
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group means for NR, PR, R-96h and R-84h times, it is unlikely that the discriminant 
analysis could consistently separate these groups from Mode 1 only. The NR group 
mean is slightly smaller than the group means for pre-recurvers and the R-96h cases. 
Combining non-recurver and pre-recurver subsamples provides a smoother and more 
physically plausible transition among groups. That is, the R-96h and R-84h samples 
might also have been added to the new PR and NR group. However, since the official 
JTWC forecast period is 72 h, retaining the R-96h and R-84h as separate classification 
categories provides a forecast ‘buffer’. The R-96h and R-84h predictions provide an 
alert of a trend toward recurvature, but not within the current 72-h forecast period. 
Statistically, these intermediate groups decrease the likelihood that non-recurvature sit- 
uations will be misclassified into the next similar group, and thus prompt the forecaster 
to erroncously predict recurvature within the 72-h forecast period. 

Based on the above considerations, the PR sample is combined with the 
straight-mover sample to define the PRNR classification group. The merits of other 
data combinations are better assessed in terms of gains in classifiability versus loss of 
time resolution. These issues are explored in the next section. 


“16-14-12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 
EOF MODE 1 COEFFICIENT 





Fig. 15, | Univariate statistics for 250 mb vorticity EOF Mode 1. Mean (open 
circle), 95% confidence interval (solid bar) and standard deviation (x) for individual 
groups (solid line) and for the combined PR and NR group data (dotted line). 
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3. Number and composition of classification groups 

The analysis goal is to fully utilize the time resolution of the data set to predict 
the time to recurvature in 12-h increments. However, the EOF coefficients for the 
vorticity fields may not have enough discriminating power to reliably discern between 
synoptic situations with this time resolution. Perhaps these predictors would be suited 
to classify some combination of groups with decreased time resolution. Thus, combi- 
nations of groups are tested to increase the percent of correct classifications and stil! 
retain some of the time resolution desired by the forecaster. 

The univariate distributions of EOF Mode 1 coefficient for each 12-h group in 
Fig. 15 indicate that this predictor alone cannot adequately discriminate among neigh- 
boring groups. Other EOF modes may provide additional dimensions that distinguish 
differences amiong the 12-h groups. The effect of multiple predictors and the effect of 
combining groups on classification skill is best examined by discriminant analyses and 
canonical correlation statistics. 

To evaluate the trade-off between time resolution and forecast accuracy, com- 
binations of time groups are tested that are potentially easier to classify and are still 
useful to the forecaster. Stepwise discriminant analysis is performed using F-to-enter 
values significant at the 0.01 level for the sample size. Analysis models with two, three 
and ten classification groups are compared in Tables 12, 13 and 14. Verifications are 
identified by their 12-h data categories so that the loss of time resolution in the classi- 
fication groups can be appreciated, Pre-recurver data are combined with non-recurver 
data for both classifications and verifications, 

The minimum useful distinction for the forecaster is between recurving and 
straight-moving storms. The recurver group is defined by R-00h through R-72h sam- 
ples, and the straight-moving group is defined by R-84h through pre-recurver and 
straight-mover data. Thus, a successful prediction would identify either a recurving or 
straight track during the current 72-h forecast period. The two-group discriminant 
analysis (Table 12) correctly identifies R-00h to R-72h cases as recurvers with 76 % ac- 
curacy. The verifications within each 12-h category do not have the same skill. The 
percent correctly classified decreases from 95% for cases at recurvatu-e time to 44% at 
R-72H. This is because times closer to recurvature are more distinct from the straight 
classification group and are, therefore, more readily recognized as recurvers. Non- 
recurvature is correctly predicted for 81% of the R-84h through PRNR cases. The 
combined sample skill is 79%, but this total is dominated by skill in prediction of 
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straight track motion because of the larger number of non-recurver cases (445) than 
recurver cases (337) in the sample population. 


Table 12. TWO-GROUP DISCRIMINANT ANALYSIS MODEL: Percent of 
recurvers and straight-movers in the sample population correctly pre- 
dicted by the two-group discriminant analysis model with the EOF modes 
indicated. Number of classifications as recurvers or straight-movers are 
provided with 12-h time resolution to indicate at what times this analysis 
model succeeds or fails. 


MODES: 1 3 5 36 41 24 14 3843 6 39 27 


CLASSIFICATION 
VERIFY? RECURVER STRAIGHT 


RECURVER; 


STRAIGHT: 
(81%) 


rrr) 0 C0 00 OO 00 a9 OD WE OO OO OD OP Go DS ED OO Ww FE Hh OO OED 


TOTAL (797) 





A more useful distinction to the forecaster would be separation into high, me- 
dium and low likelihood of recurvature (Table 13), The high group is defined as the 
R-00h to R-24h cases, the medium group as all R-36H to R-72H cases, and the low 
group as the R-84h through PRNR cases, Classification functions correctly classify the 
sample into high, medium or low categories with 75%, 56% and 73% accuracy, respec- 
tively. While this three-group classification scheme increascs the time resolution in the 
recurvature prediction, the ability to correctly classify track types during the 72-h fore- 
cast period (77%) is less than that in the two-group model (79%). In addition, skill in 
identifying straight-moving situations is degraded. The addition of another recurver 
group can be viewed as increasing the number of correct classification categories for 
recurver cases and increasing the number of incorrect classification categories for non- 


recurvers. The intermediate group also increases the discriminant analysis model 
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separation between the 12-h data categories in the high recurver group and the 
non-recurver categories. 


Table 13. THREE-GROUP DISCRIMINANT ANALYSIS MODEL: Percent of 
recurvers and straight-movers correctly predicted by a three-group 
discriminant analysis model. Format is similar to Table 12. 


MODES: 1 3 2 4 561441 9% 24 38 12 36 15 37 


CLASSIFICATION 


RECURVER: HIGH 
(82%) (752) 


MED 
(567) 


STRAIGHT: LOW 
(732%) (734) 


TOTAL (774) 


Discriminant analysis into the ten 12-h classification groups R-00h to R-96h 
plus PRNR (Table 14) maximizes the time resolution of the predictions for this data set, 
but at the expense of classification accuracy. The ability to distinguish between recurver 
and straight-track situations is only 72% as compared to the 79% and 77% accuracy 
achieved by the two-group (Table 12) and the three-group (Table 13) forecast models, 
respectively. Also, the ability to discern high, medium and low likelihood of recurvature 
is 2-6% less than for the three-group model. The improvement in recurver classification 
skill relative to skill in forecasting straight track situations between the two-group model 
and three-group model is again noted. The ten-group model correctly classifies recurver 
and straight situations with 79% and 67% accuracy, respectively. The greater difference 
in recurver versus straight classification skill for the two-group model (6%) than for the 
three-group model (2%) may be duc to the increase in the ratio of the number of 
recurver to straight groups from 2.0 (two-group model) to 2.3 (three-group model). As 
expected, classification accuracy within each 12-h verification category is considerably 
less than for the broader high-medium-low or recurver-straight categories. Higher skill 
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exists in correctly classifying cases at the extremes of the forecast continuum, i.e., at re- 
curvature (60%) and PRNR (48%). Skill in identifying cases in the intermediate cate- 
gories ranges from 15% to 38%. 


Table 14. TEN-GROUP DISCRIMINANT ANALYSIS MODEL: Percent of 
recurvers and straight-movers correctly predicted by a ten-group 
discriminant analysis model. Format is analogous to Tables 12 and 13. 


MODES: 1 2 3 &§ 4 62445 914 41 


CLASSIFICATION 
VERIFY CORRECT 00 4 72 84 96 PRIIR 


RECURVER: HIGH R-~-OOH 
(792) (73%) R-12H 
R-24H (264) 
MED (19%) 
(537) 


STRAIGHT: LON 
(674) (672%) 


TOTAL (722) 





The distributions of group centroids in discriminant space (Figs. 16, 17 and 18) 
provide a graphic representation of the ability of the two-, three-, and ten-group 
discriminant analysis models to separate, and thus classify, the cases belonging to each 
group. Individual cases and group centroids may be located in discriminant space using 
canonical correlation analysis techniques discussed in Section IV.A. An x- and y- 
coordinate for each case is found by evaluating the first and second canonical 
discriminant functions, respectively, using the predictor values for the case. Group 
centroids are then located at the mean of the x-coordinates and the mean of the y- 
coordinates for all cases in the classification group. The mean cor:dinutes of the cases 
in each 12-h verification category, hercafter referred to as time centroids, are also com- 
puted for the two- and three-group models (Figs. 16 and 17), Because these time 
centroids are equivalent to the group centroids for the ten-group model, they provide a 
means of comparing the relative separation achieved by each of the three discriminant 
analysis models. 
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As the 12-h time centroids are in sequential order, they reflect the time trends 
in the patterns accompanying recurvature. Notice that the relative separations between 
consecutive time centroids are generally similar along the first canonical discriminant 
function for all three models (Figs. 16, 17 and 18). These relative distances between 
group centroids indicate how well the model is able to distinguish between groups. The 
proximity of the 12-h time centroids to their parent group centroid gives an indication 
of the classification accuracy for each 12-h verification category or combination of 12-h 
categories. 

Since the number of canonical discriminant functions computed is one less than 
the number of groups, a one-dimensional plot is presented for the two-group model (Fig. 
16). The time centroids of the R-84h through PRNR verification categories that com- 
prise the straight-track group are all closer to the straight-group centroid than to the 
recurver-group centroid. In addition, the R-72h time centroid (which belongs in the 
recurver group) is closer to the straight-group centroid. While the actual distribution 
of individual cases determines the model skill reported in Table 12, the relative positions 
of the time and group centroids in Fig. 16 illustrate why the model is better at correctly 
classifying straight track cases (81%) than it is at correctly identifying recurver cases 
(76%). The fitted distributions of the straight-mover and recurver cases confirm these 
observations. The cases in the straight group are more closely distributed about their 
group centroid than are the cases in the recurver group. The amount of overlap between 
the two distributions indicates the number of cases that may be misclassified by the 
two-group model. As previously noted, canonical discriminant functions can be used to 
classify by computing a canonical discriminant function score (not shown) to divide the 
cases into straight-movers and recurvers. Such a dividing line in Fig. 16 would give an 
exact representation of the number of cases that would be misclassified into each group 
by the first canonical discriminant function. 

In the multiple-group discriminant analysis models, classification skill is a 
function of how well an individual group is separated from its neighboring groups and 
the actual distribution of its individual members. The three-group and ten-group model 
centroids (Figs. 17 and 18) are spatially separated in a curvilinear fashion that reflects a 
consistent time trend in the group centroids. These separation patterns explain why the 
classification skill is highest for the groups on either end of the time spectrum. Consider 
a normal distribution of sample cases for each group in an ellipsoidal pattern centered 
around each group centroid. For a middle group with neighbors on either side, classi- 
fication skill is a function of its separation from neighboring groups and the distribution 
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Fig. 16. Canonical discriminant fiction centroids for the two-group 
model. Group centroids are located at the mean of all cases in each group (vertical 
lines). ‘Time centroids for PRNR through R-OUh in 12-h intervals are indicated by 
(QO) for categories belonging to the straight group and (X) for categories belonging 
to the recurver group. Fitted distributions of the straight-movers and recurvers 
ee first canonical discriminant function axis illustrate the overlap of the dis- 
tributions. 


of the individual cases belonging to the group. Since the end groups have no neighbor- 
ing group on one side, sample cases on the no-neighbor side of the distribution will be 
classified into the end group. For both the three- and ten-group models (Tables 13 and 
14), classification skill is notably higher in the end groups. Note that for the three-group 
model, the skill is higher for the high likelihood of recurvature group, than the low like- 
lihood of recurvature group because the high group is better separated from the inter- 
mediate medium group than the low group. Differences in group classification skill for 
the ten-group model are more difficult to interpret. For example, the R-00h to R-36h 
group centroids are better separated in Fig. 18 than those for R-48h to R-96h. In gen- 
eral, better separated groups in Fig. 18 demonstrate better classification skill. One ex- 
ception is the R-72h group, which has higher skill than the more separated groups and 
may reflect the effect of the distribution of the sample cases on classification skill. 
Ultimately, the number and composition of classification groups must be a 
trade-off between the forecaster’s need to specify a precise time of recurvature versus the 
diminishing skill as more precision is attempted. To illustrate this trade-off in forecast 
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Fig. 17. Canonical discriminant function centroids for the three-group 
model. The first two canonical discriminant functions form the axes along which 
group centroids (solid markers) and 12-h time centroids (open markers) are plotted 
for high (circles), medium (triangles) and low (squares) likelihood of recurvature 
classification groups. 
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Hig. 18. Canonical discriminant function centroids fer the ten-group model. The 
first two canonical discriminant functions form the axes along which group centroids 
(solid circles) are plotted for R-OOh through PRNR classification groups. 
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accuracy and time resolution, a ten-group model will be pursued using the entire 250 
mb sample population. The ten-group model is chosen to fully test the predictive ca- 
pability of EOF representation of the synoptic vorticity fields to predict recurvature at 
the resolution of the data set. 


C. APPLICATION 

Discriminant analysis packages include options that permit flexibility in the selection 
of predictors and in the method of computing the classification functions. The optimal 
application of these program features is, of course, a function of the goals of the analy- 
sis. In this section, several discriminant analysis options available in BMDP7M are 
considered, These features include prior probabilities, contrasts and three different 
methods for entering predictors into the analysis. An in depth discussion of all the fea- 
tures available in computer packages and a comparison of BMDP7M with other 
discriminant analysis packages can be found in Tabachnick and Fidel! (1989). 

1, Adjusting for prior probabilities and the cost of misclassification 

The prior probability is the probability that an individual case selected for a 
group is actually a member of that group. Unless otherwise specified, cases are assumed 
to have an equal probability of belonging to any group, and the classification functions 
are derived with equal probabilities of misclassification. Specifying group probabilities 
in the analysis procedure changes the ratio of the probability of errors by adjusting the 
discriminant function scores, or equivalently by adjusting the constant terms in the 
classification functions to achieve a ratio of errors consistent with the designated prior 
probabilities. Prior probabilities have the greatest effect on classification skill when 
groups are relatively indistinct from one another. 

In this study, prior probabilities could be assigned on the basis of the group 
sizes in the sainple or according to the climatological probability that a synoptic situ- 
ation belongs to each group. For example, prior probabilities based on the relative size 
of each of the ten 12-h groups would range from 50% for the PRNR group to 3-7% for 
the remaining groups. Thus, a discriminant analysis model including these prior proba- 
bilities would only classify a case into one of the groups with a low prior probability if 
there was verv strong evidence that it was not in the PRNR group. Prior probabilities 
could also be used to achieve some other desired ratio of errors that would be better 
suited to the needs of the forecaster. That is, if the cost of misclassification of a certain 
group was high, assigning a high prior probability to that group would decrease the 
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likelihood that a case belonging to that group would be misclassified into another group 
with a lower prior probability. 

Assigning prior probabilities may be advantageous in future applications of 
discriminant analysis to fine tune the recurvature forecast model using EOF predictors. 
Since such adjustments to the analysis may not give a true indication of the 
discriminatory power of the EOF coefficients, prior probabilities will not be specified in 
this study. 

2. Contrasts to direct stepwise selection of predictors 

In discriminant analysis, a contrast is a series of coefficients, one for each clas- 
sification group, that modifies the stepwise selection of predictors. The coefficient for 
each group indicates the relative amount of differentiation desired between groups. 
Coeflicients must be specified such that the sum of the coefficients for all groups equals 
zero. Contrasts do not affect the computation of the classification functions as do 
posterior probabilities. Rather, contrasts affect the computation of the F-to-enter and 
F-to-remove statistics. Therefore, they alter the stepwise selection of predictors such 
that only those predictors that maximize the differences between groups are selected in 
the analysis. Thus, contrasts can also be used to determine which predictors are im- 
portant in distinguishing between specific pairs of groups. In developing a ten-group 
discriminant analysis model, this is not practical because of the large number of pairwise 
tests to be considered. Furthermore, it may not be a sound methad of selecting a final 
set of predictors because the predictors that are useful in distinguishing between some 
pairs may adversely affect discrimination between other pairs. ; 

Several contrasts are tested in Table 15. Because F-to-enter computations are 
different for each analysis, stepwise selection of EOF modes is stopped after selection 
of the ten predictors. No contrasts are specified in the first analysis in Table 15 to allow 
comparison with the different contrasts. The second analysis is designed to maximize 
the difference between the recurvature cases and the straight-track cases. The third 
maximizes distinction among all recurver situations (less than 72 h) and among all 
straight situations equally. The fourth and fifth analyses are designed to maximize the 
recurvature and non-recurvature situations and also to enhance the distinction among 
recurvature groups. 

Maximizing the difference between R-0Oh and PRNR groups (line 2 in Table 
15) improves the overall classification skill (I-score), but skill in distinguishing time to 
recurvature (R-score) is less than using no contrasts. The contrast designed to increase 
the differences equally among 21] recurver and straight groups (line 3) improves 
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Table 15. COMPARISON OF MODELS USING CONTRASTS: Effect of vari- 
ous contrasts on predictor selection and discriminant analysis model 
performance. Predictor modes are in the order selected. 


T*SCORE R-SCORE ZR %S XT CONTRASTS (00 THRU PRHR) PREDICTOR MODES: 


PASE MOAR SOS ODSSHASSSOSHNSSHRROEREBEENED. <P OO SD ee 20 OF SD AS OD OD ED GD OD Se Aad GU OR OR ee Om OD OD et et OS SE UD OD OD US ED ED OO OS OS OD OS OS DOD OD 


2.1138 1.8665 NONE 23 8 4 62445 916 
2.0959 1.9228 -1:0:05050,0,0,0,0,1 74514 5 24 


2.1867 1.7033 9B 39-3593 9-3 9-3 9-35797 97 
2.0320 1.7718 $B 97 9-6 9-B 3-4-3 9-2 9-150 536 
2.2097 1.6647 7 9-6 9-B 99-4 993 9-29-19 99959 





discrimination among times to recurvature (R-score), but the overall classification skill 
(I-score) is less than without contrasts (line 1). Contrasts designed with the additional 
goal of improving the ability of the model to correctly forecast the time to recurvature 
(lines 4 and 5) result in improvement in the time accuracy of recurver forecasts (R-score) 
and in the ability to recognize recurver situations (%R = 82 and 85, respectively). 
However, this is at the expense of weaker discrimination of straight-movers (%S = 67 
and 64, respectively). 

Except for EOF Mode 1, the modes sclected and the order of selection vary for 
the various contrasts tested in Table 15. Iligher mode predictors are selected earlier in 
those analysis models with contrasts that are designed to increase differences among 
multiple groups (lines 3-5) than in the analysis model using a simpler contrast between 
only two classification groups (line 2). 

In summary, specifications of different contrasts as in Table 15 do lead to im- 
proved recurvature-related scores or straight-mover scores relative to a discriminant 
analysis model without contrasts. However, both scores are not improved simultane- 
ously. Due to the complexities of a ten-group model, the changes in predictors and 
forecast skill produced by these contrasts are difficult to interpret. As in the specifica- 
tion of prior probabilities, this analysis feature may be hetter utilized to fine tune an 
EOF forecast model than to demonstrate the usefulness of EOF’s in identifying recur- 
vature situations. Since the forecast skill without the use of contrasts (line 1) is com- 
parable to with contrasts (lines 2 through 5), the contrasts feature will not be used 
further in this study. 
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3. Direct, hierarchical and stepwise selection of predictors 

Discriminant analysis model performance depends primarily on the 
discriminatory power of the predictor variables. When many potential predictors are 
available, such as the first 45 EOF coefficients for synoptic vorticity considered in this 
study, the question is which combination of predictors will produce the best distinction 
among classification groups and in what order they should enter the analysis, 

Three options of selecting and entering predictors in the discriminant analysis 
are the direct, hierarchical and stepwise methods. In the direct method, the predictors 
are selected by the user and all are entered into the analysis in one step. In the hierar- 
chical method, the user specifies both the predictors and the order they enter the analy- 
sis. The stepwise method relies on statistical criteria specified by the user to select the 
predictors and determine their order of entry. 

The direct and hierarchical discriminant analysis methods are advantageous be- 
cause they allow the user to control the predictors in the analysis. However, they require 
prior knowledge of the relative discriminatory value of each potential predictor. Except 
for EOF Mode 1, which is shown in Section I1.A.2 to represent a straight-mover situ- 
ation with the storm in the monsoon trough or a recurvature situation depending upon 
the value of the coefficient, little can be inferred about the potential discriminatory 
power of the increasingly complex patterns for the EOF coefficients. Therefore, a step- 
wise discriminant analysis will be used to select the most significant predictors. 


D. FINAL MODEL DEVELOPMENT 

The final model to predict time to recurvature with 12-h resolution is developed with 
a stepwise analysis of the entire sample population. Potential predictors are selected 
from the first 45 EOF coefficients representing the 250 mb synoptic vorticity fields. No 
prior probabilities and no contrasts are specified. 

The final question is the criteria to limit the number of predictors selected in the 
stepwise analysis. Mathematically, the maximum number of predictors is equal to the 
total number of cases in the sample minus two (Klecka 1980). Although all 45 EOF 
coefficients could be used to develop a discriminant analysis model for these data, the 
objective is to obtain the best classification skill with the fewest possible predictors. 
Such a parsimonious solution is sought by performing ten stepwise discriminant analyses 
(Table 16). The first analysis is restricted to one mode, the second is restricted to two 
modes, and so on until ten modes are selected in the tenth analysis model. Low F-to- 
enter (1.0) and F-to-remove (0.996) values are specified in the analysis procedure to 
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ensure the selection of up to ten predictors. However, the F-to-remove values (not 
shown) for the modes selected in the analyses in Table 16 indicate that each of the 
selected modes is significant to the separation of recurver groups at the 0.01 level or 
better. 


Table 16. STEPWISE SELECTION OF ONE TO TEN MODES: Classification 
skill in terms of [-score, R-score and percent correctly classified recurver 
(R-00h to R-72h) or straight (R-84h to PRNR) for ten discriminant an- 
alyses that are limited to one to ten modes successively. Jackknifed re- 
sults (discussed in Section 1V.B.1) reflect the skill expected with 
independent testing. 


MODEL RESULTS: 
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An optimal forecast model is selected from Table 16 by examining gains in classi- 
fication skill as the number of modes in the analysis is increased from one to ten. As 
expected, the jackknifed results in columns six through ten are worse than the learning 
model results in columns one through five. The general trend is toward improved skill 
(smaller penalty scores and higher percent correct classifications) as the number of 
modes increases from one to ten. The seven-mode discriminant analysis model best 
meets the analysis objectives because the addition of Mode 24 as the seventh mode im- 
proves all measures of classification skill relative to the skill for the models with six or 
fewer modes. However, the addition of Mode 45 as the eighth mode results in degra- 
dation in all measures of skill. The addition of Mode 9 in the nine-mode model produces 
only a slight improvement over the eight-mode analysis and only results in skill scores 
nearly equal to those for the seven-mode analysis. Some improvement is again noted 
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by the addition of a tenth predictor, but the %R (77) is less than the %R (80) for the 
seven-mode model. The seven-mode discriminant analysis model is also preferable be- 
cause predominantly low-numbered EOF modes are used in the analysis. Since these 
lower modes represent large-scale patterns and account for a larger fraction of the vari- 
ance in the synoptic vorticity fields than the higher modes, the coefficients for the lower 
mode EOF’s should be better discriminators of recurvature than those for higher modes, 
Furthermore, the higher mode predictors may represent noise in the synoptic field, yet 
be statistically useful in predicting recurvature in this data set. Based on these consid- 
erations, the seven-mode model in Table 16 is chosen to demonstrate the potential of 
discriminant analysis of the EOF representation of synoptic vorticity ficlds to forecast 
time to recurvature with 12-h resolution. 


E. FINAL MODEL EVALUATION 
The discriminant analysis model derived in Section IV.D from the stepwise selection 
of seven EOF coefficients of 250 mb vorticity is indicative of the forecast skill obtainable 
with this analysis method and these data. In this section, the final discriminant analysis 
mode] is evaluated and compared to the Euclidean distance model derived in Section 
111.B. Since both the discriminant analysis and Euclidean distance models were derived 
from the 250 mb vorticity data, the comparison is between the two analysis methods. 
1. Forecast skill 
The classification matrix for the final discriminant analysis model is presented 
in Table 17, The model correctly identifies synoptic situations that will lead to recur- 
vature within the next 72-h forecast period with 80% accuracy, Skill for straight-track 
situations is 66%. Thus, there is a greater chance of a false alarm of recurvature than 
a missed recurvature prediction. The combined skill in predicting track type is 72%. 
Group classification skill is best near recurvature (R-00h = 60%, R-12h = 29%, and 
R-24h = 29%) and in the straight-track categories (R-96h = 29% and PRNR = 47%). 
Skill in the intermediate categories only ranges from 7-22%. This result was anticipated 
tased on initial testing with the ten-group discriminant analysis model in Section IV.B,3 
(Table 14). The ten-group model in Section IV.B.3 is the eleven-mode model that would 
have been included in Table 16 if the stepwise selection of one to ten modes had been 
carried one step further. It was derived using the same analysis options from the step- 
wise selection of the same seven modes in the final discriminant analysis model plus 
Modes 45, 9, 14 and 41. Canonical discriminant functions for the final discriminant 
analysis model (not shown) are nearly identical to those in Fig. 18 for the ten-group 
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model and show relatively little separation among the centroids for the intermediate 
R-86h through R-36h groups. 


Table 17. CLASSIFICATION MATRIX FOR THE TEN-GROUP 
(SEVEN-MODE) MODEL: Classifications for data in each 12-h verifi- 
cation category and the percent correctly forecast by the ten-group 
(seven-mode) discriminant analysis model. Percent of recurvers and 
straight-movers correctly predicted is also listed. 


MODES: 1 2 3 5 & 6 24 


CLASSIFICATION 
VERIFY CORRECT 00 24 36 48 60 72 


RECURVER: R-OOH 
(807%) 


STRAIGHT: 


wn 
wn e& 


TOTAL €72%) 





Bar charts (Fig. 19) of the percent of cases in each 12-h verification category 
that are classified into each group further illustrate the relatively poor ability of the 
model to pinpoint the time to recurvature among the R-84h through R-36h cases, The 
intermediate 12-h categories not shown in Fig. 19 tend to have characteristics interme- 
diate to the 24-h bar charts. Cases belonging to the better separated groups (R-00h, 
R-24h, R-96h, and PRNR in Fig. 19) are more frequently correctly classified or classified 
Within one to two 12-h groups of the correct classification group than those belonging 
to the less separated groups (R-48h and R-72h in Fig. 19). The R-36h (not shown), and 
to a lesser extent the R-48h through R-72h cases, are classified into all groups with 
nearly the same frequency, which reflects little ability to correctly distinguish the timie 
to recurvature for these synoptic situations. Notice that only 28% (30%) of the R-48h 
(R-36h) cases were misclassified into straight-track groups. 

2. Additional model output to assist the forecaster 

The discriminant analysis model classifies an individual case into the group that 

has the highest classification function score (discussed in Section IV.A). Discriminant 
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Fig. 19. Classification bar charts at 24-h intervals for the ten-group (seven-mode) 
model. Percent of N cases (ordinate) verifying as R-00h (top left), R-24h (top 
right), R-48h (middle left), R-72h (middle right), R-96h (bottom Icft), and PRNR 
(bottom right) that are classified into each group R-00h through PRNR (abscissa). 
Shaded bars indicate the percent in the correctly classified category. 
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analysis provides additional information that may assist the forecaster in subjectively 
assessing the validity of a model forecast. These outputs are the Mahalanobis distances 
and the posterior probabilities. 

The Mahalanobis distance (D?) is the squared distance of an individual case to 
each group centroid. Since D? has the same properties as the chi-squared (x?) statistic 
with degrees of freedom (df) equal to the number of predictors, Mahalanobis distances 
are measured in chi-square units. 

The posterior probability is the probability that an individual case belongs to a 
group, which is calculated from D? by assuming the cases in each group are clustered 
around the centroid in a multivariate normal distribution and that every case belongs to 
one of the groups. Posterior probabilities are more useful to the forecaster because a 
set of nearly equal (small) percentage values for the time categories indicates the likely 
uncertainty in time to recurvature. Because posterior probabilites are used subjectively 
in this study, their contribution to forecast skill will not be evaluated in this section, 

3. Comparison of discriminant analysis and Euclidean distance models 

Classification skill for the final ten-group discriminant analysis modei and the 
Luclidean distance model is compared in Table 18. Although the learning data sets dif 
fer, the shill scores reflect the ability of each model to forecast all 782 cases. As c\- 
pected, the discriminant analysis model (line 1) outperforms the Euclidean distance 
model in all areas except %S (discriminant analysis = 66, Euclidean distance = 68). 
Because the learning set for the Euclidcan distance model is comprised of only 161 cases, 
the results for this model (line 3) are predominantly independent test results. Thus, a 
more equitable comparison is between the jackknifed results for the discriminant analysis 
(line 2) and the Euclidean distance model. In this comparison, the discriminant analysis 
model still outperforms the Euclidean distance model in all areas except %0S$ (jackknifed 
discriminant analysis = 65). The conclusion that discriminant analysis is a better 
method for exploiting the predictive capability of the EOF coefficients is based on rela- 
tive performance of the two forecast models. The Euclidean distance method is an in- 
tuitive, and thus more subjective, method of forecasting tropical cyclone recurvature 
using EOF predictors of synoptic vorticity. Discriminant analysis is a statistically-based, 
and thus more objective, method for classifying the time to recurvature. Both analyses 
demonstrate shill compared to the climatological model (line 4) in which predictions are 
based on the relative historical frequency of occurrence of each 12-h classification group 
in straight and recurving best track data. 
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Table 18. COMPARISON OF MODEL FORECAST SKILL: Classification skill 
in terms of I-score, R-score and percent correctly classified recurver 
(R-00h to R-72h) or straight (R-84h to PRNR) for the final ten-group 
discriminant analysis model and the Euclidean distance model based on 
250 mb vorticity EOF coefficients and climatological forecasts based on 
1979-1984 best track data. Discriminant analysis jackknifed results (dis- 
cussed in Section 1V.B.1) reflect the skill expected with independent test- 
ing. 


ANALYSIS METHOD LEARNING SET I-SCORE R-SCORE ZR ZS ZT 


(DS Oo Oe YS He Um Oe ae as ts tn tn On ee Se a SD Ay me IY A CF TN SD CS nt ty th Oh A SP A HL AS QL A CE PA A UO He OP A Wd He OP OO DO OO AO OO Ow OD a a eas 


DISCRIMINANT ANALYSIS ENTIRE SAMPLE 006672 123 & 4 6 2%4 


JACKKNIFED RESULTS . . 79 66 71 


CLIMATOLOGY 1979~1984 





F. VIOLATION OF ASSUMPTIONS 
Although discriminant analysis is a robust procedure, the analysis results may be 
adversely affected by the violation of the requirements or assumptions to apply the 
method; 
1, two or more distinct groups must be specified; 
2. at least two cases must be present in each group; 


3. the number of discriminating variables must be less than the total number of cases 
minus two; 


4, the discriminating variables must be measured such that the differences between 
successive values are always the same; 


5. the discriminating variables must not be a linear combination of the other dis- 
criminating variables; 


6. the variance-covariance matrices must be approximately equal for each group; and 


7, the group distributions must be multivariate normal. 


Effects of violating the seven discriminant analy::, assumptions are explained in detail 
in Klecka (1980), who notes that the best guide for a prediction model is the percentage 
of correct classifications. If the percentage is high, any violations s.ere not harmful. If 
the percentage is low, it could be due to the violation of the assumptions or weak dis- 
criminating variables, 








The first four assumptions are met by the data in this study. The BMDP7M pro- 
gram incorporates tolerance criteria in the stepwise selection of discriminating variables 
that protect against violations of multicollinearity (fifth assumption). Homogeneity of 
the variance-covariance matrices (sixth assumption) is more important in classification 
than in statistical inference. Cases tend to be over-classified into more disperse groups. 
Homogeneity of the variance-covariance matrices is tested by examination of the group 
standard deviations for each predictor and by inspection of the scatter plots of the first 
two canonical function scores for the cases in each group. The ten group standard de- 
viations have no gross discrepancies in predictor variance. The largest differences in the 
variances are observed in Mode 1, and range from 39,2 for the R-00h group tc 15.4 for 
the R-96h group. The canonical discriminant function scatter plots for each group (not 
shown) have roughly equal dispersion, which indicates that the variance-covariance 
matrices are approximately homogeneous. 

Testing the multivariate normality (seventh assumption) of all linear combinations 
of the sample predictors is not currently feasible (Tabachnick and Fidell 1989). How- 
ever, discriminant analysis is robust to violations of normality if they are caused by 
skewness rather than outliers. To test for outliers, the Mahalanobis distance from each 
group centroid to its member cases is evaluated as y? with degrees of freedom equal to 
the number of predictors. Only three of the 782 cases in the sample population (Table 
19) exceed the critical y? = 24.32 at a = 0.001 with seven df. These three outliers are from 
recurving storms at R-VOh or R-12h and two of them are from the Euclidean distance 
clean set storms (TY Vernon and ST Forrest). Eliminating the three multivariate 
outliers from the discriminant analysis (not shown) does not appreciably change the 
classification accuracy for the ten 12-h groups, regardless of whether the same seven 
predictors are hierarchically entered into the analysis or seven new predictors are selected 
in a stepwise fashion. However, the exclusion of these three cases from the sample 
population causes subtle changes in the F-to-enter statistics for each predictor. For ex- 
ample, the first seven modes selected in the stepwise analysis are Modes 1, 2, 3, 5, 4, 45, 
and 6 instead of Modes 1, 2, 3, 5, 4, 6, and 24. Such multivariate outliers should be 
eliminated from the analysis to develop an operational forecast model. Since one goal 
of this study is to compare the classification skill for the discriminant analysis model 
with the Euclidean distance model that was derived using two of these cases, the 
multivariate outliers are not excluded from the final discriminant anaiysis model. 

Kachigan (1982) questioned whether discriminant analysis is an appropriate analysis 
technique for the dichotomization of a continuous criterion variable, such as time to 
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Table 19. MULTIVARIATE OUTLIERS: Cases for which the Mahalanobis dis- 
tance to the group centroid exceeds the critical xy? value of 24.3 for the 
final discriminant analysis model. 


STORM STORM VERIFICATION MODEL HAHALANOBIS DISTANCE TO 
NO/YR NANE CATEGORY FORECAST VERIFICATION GROUP CENTROID 


PALA 0 OF 0 OF GG Ob A OS OO BO ED ED FD AP OE WD SO GOD DOD HD OF A Ot SO 08 OED DW OF OS HOS ES SD UO OSS A ODS OO OO tO OOD 


1679 =TY LOLA R-OOH R-12H 25.7 


2280 =TY VERNON R-12H R-00H 27.9 
1183 = ST FORREST R-00K R-OO0H 41.1 





recurvature in this study. A‘though the recurvature and non-recurvature samples rep- 
resent distinct sets, the synoptic situations that lead to recurvature do evolve contin- 
uously in time and thus may not be easily distinguished. Regression analysis may be a 
more powerful and efficient anaiysis procedure since the regression method would fully 
utilize the time resolution cf the observed data and the time trends in the EOF predictors 
to model the time to recurvature as a continuous variable. The ten-group discriminant 
analysis model also makes full use of the time resolution of the data, but without any 
data transformations that might be required to meet the linearity assumptions of the 
regression model. Thus, discriminant analysis provides an eflicient first look at the 
ability of EOF coefficients of synoptic vorticity to predict time to recurvature. 
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V. FORECAST EXAMPLES 


Operational application of a discriminant analysis model using EOF predictors to 
forecast tropical cyclone recurvature is relatively simple. Only a personal computer or 
programmable calculator would be required to interpolate the analyzed wind fields onto 
the storm-centered grid, compute the vorticity at the gridpoints, calculate the EOF 
eigenvalues corresponding to the vorticity field and to solve for the classification func- 
tion scores and posterior probabilities for each of the model's classification groups. In 
this section, forecast examples from the learning set are presented. The use of posterior 
probabilities to assess the validity of a forecast is also discussed. 


A. TEST CASES 

The final discriminant analysis model forecasts are presented for the 1984 examples 
(Vig. 1) of a recurver (ST Vanessa), a straight-mover (TY Agnes) and an odd-mover (ST 
Bill). The forecast skill for these three storms is typical of other storms in the data set. 

1. Recurver 

The final discriminant analysis model forecasts of the time to recurvature for 

ST Vanessa are shown in Fig. 20. ST Vanessa tracked along the southern side of the 
subtropical ridge, which had redeveloped in the wake of TY Tad, for nearly five days 
before recurving (ATCR 1984). Only the two discriminant analysis model forecasts of 
R-72h for times greater than 96 h before recurvature are clearly erroncous, All forecasts 
within 72 h of recurvature are correct predictions of a recurver-track type. Although 
only three of the seven recurver-track type forecasts are correct (R-00h, R-48h and 
R-72h), the forecasts all progress in a sequential manner toward recurvature (R-72h, 
R-72h, R-48h, R-48h, R-36h, R-36h, R-00h). The 12-h forecast sequences for most of 
the recurving storms in the sample have a similar progression. Although a prediction 
may be repeated at successive 12-h forecasts and one or more sequential classification 
groups may be skipped between successive forecasts, the predictions tend corrcctly to- 
ward recurvature. Such a consistent trend toward recurvature in successive operational 
forecasts would add confidence to the individual 12-h recurving-track forecasts. 

2. Straight-mover 

The final discriminant analysis model forecasts for TY Agnes are presented in 

Fig. 21. TY Agnes tracked west-northwest under the influence of an easterly steering 
flow along the south side of a broad mid- to low-level subtropical ridge that extended 
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Fig. 20. Time-to-recurvature forecasts for recurving ST Vanessa. Discriminant 
analysis model forecasts (top number) and verifying time (bottom) to recurvature (h) 
at the JTWC best track 00 and 12 UTC positions during 22-31 October 1984 (dots). 
The letters PR indicate a pre-recurvature situation of more than 96 h prior to re- 
curvature. 


from the dateline west to the coast of Vietnam (ATCR 1984). Seven of the nine fore- 
casts correctly predicted straight-track motion during the 72-h forecast period. Two 
forecasts of recurvature in 60 h are mispredictions of the track type. These two R-60h 
forecasts are 48 and 60 h (72 and 84 h) before landfall in Vietnam and subsequent 
dissipation. 
3. Odd-mover 

The forecast model in this study was not designed to distinguish odd-mover bv- 
havior such as loops and stairstep tracks. ‘Therefore, forecasts based on the vorticity 
fields preceding or during erratic motion cannot provide accurate information on the 
storm's track. However, classifications may indicate storm motion if the next segment 
of the track fits either of the model’s straight or recurver track categories. 

The time-to-recurvature forecasts for ST Bill are shown in Fig. 22. Although 
ST Bill was expected to recurve similar to ST Vanessa, the complex environmental 
steering associated with an interaction with TY Clara caused Bill to track southeastward 
before dissipating east of the Philippines (ATCR 1984). In the first 48 h after the 
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Fig. 21. Time-to-recurvature forecasts for TY Agnes, Discriminant analysis 
model forecasts of time to recurvature (h) at the JTWC best track 00 and 12 UTC 
positions (dots) during 1-8 November 1984. Definition as a straight-moving storm 
requires a minimum of 72 h after the forecast time to ensure verification as a 
straight-mover. PRNR refers to the forecast model classification group for recurv- 
ing cases more than 96 h prior to recurvature time and straight-moving cases. 


tropical cyclone formation alert (TCFA), Bill tracked slowly in a 25 n mi (46 km) 
diameter cyclonic loop. Although the next track segment is straight, forecasts during 
Bill's first loop predict recurvature in 60 h (first forecast) to 72 h (second through fourth 
forecasts), Once the erratic looping is completed, the model correctly identifies the 
straight-track segment in the next eight forecasts. As Bill began to recurve around the 
western end of the subtropical ridge, the midlatitude trough passed to the north and 
weakened the ridge, which slowed Bill’s progress. The intense low-level circulation in the 
Philippine Sea associated with TY Clara, and the strengthening northeas: monsoon flow, 
forced Bill to the southeast in an anticyclonic loop, and Bill rapidly weakencd. 

This set of model forecasts is unusual in that there is a sudden transition from 
straight-track predictions (R-96h) to the recurver predictions (R-24h, R-12h and R-00h). 
The niodel forecasts for recurving storms tend to transition more appropriately through 
successive recurvature classification categories. The model classifies the vorticity fields 
during the anticyclonic loop as recurvature (R-OOh) situations. Since the forecast model 
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is unable to predict looping or southeast motion, synoptic situations after Bill’s .. vuld-be 
recurvature time (fifth R-00h forecast) are classified into the most similar of the ten 
straight plus recurver groups. While these recurvature forecasts correctly predict the 
recurvature-like motion as Bill moves northwest aid then north and northeast, there is 
no indication in the model forecasts that the Bill will subsequently loop toward the 
southeast. The last two forecasts of R-12h and R-36h are based on the synoptic situ- 
ation associated with Bill’s southeast motion and precede a small cyclonic loop. These 
last forecasts indicate that the situation has changed, but continue erroneously to predict 
recurvature. 
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Fig. 22. ‘Time-to-recurvature forecasts for ST Bill. Discriminant analysis model 
forecasts of time to recurvature (h) are indicated at the JTWC best track 00 and 12 
UTC positions (dots) during 8-21 November 1984. 


B. POSTERIOR PROBABILITIES AS AN AID IN THE FORECAST DECISION 
The posterior probability is the probability that an individual case belongs to a 
group. The probabilities for all groups sum to one. The posterior probability (P) that 








case i belongs to group / is computed from the Mahalanobis Distance (D*) or directly 
from the classification function score (S) for the ith case for the jth group: 


xp(S, 
Jae ee (5.1) 


Yiexp(Sy) 


Kul 


Posterior probabilities can be used subjectively by the forecaster to assess the likeli- 
hood that a classification is correct. If the posterior probability for one classification 
group is high relative to the probabilities for the remaining groups, the forecaster can 
have more confidence that the model forecast is correct. If the posterior probability for 
the classification group is low and nearly equal to the probabilities for one or more of 
the other groups, then the forecaster should have less confidence in the prediction. 
Posterior probabilities can also be useful when a classification is repeated at successive 
12-h forecasts to indicate whether the forecast is more or less likely to be correct. 

The posterior probability would be more useful if some cutoff value existed that 
would indicate the forecast was likely to be correct. To examine whether this is the case 
for the discriminant analysis model, posterior probabilities for all cases classified into 
each 12-h forecast category are plotted as a function of the actual verification categories 
(Fig. 23), The ranges of the posterior probabilities vary with the forecast classification 
group. Probabilities are highest for the R-0O0h and PRNR forecasts and are lowest for 
the R-36h through R-84h forecasts. Unfortunately, the posterior probabilities are not 
distinctly higher for the correct predictions than for the incorrect predictions. Posterior 
probabilities for correct classifications are most distinct from incorrect classifications 
when PRNR is forecast. Therefore, posterior probabilities are most useful in evaluating 
PRNR forecasts. 

Posterior probabilities for recurving storm ST Abby are presented in Table 20. ST 
Abby continually tracked to the right of the 1983 JTWC official forecasts (ATCR 1983). 
Although the JTWC forecast aids and numerical progs had consistently indicated a 
west-northwest track for Abby, the subtropical ridge over Japan never intensified as 
anticipated and Abby recurved to the northeast. Sandgathe (1987) cites ST Abby as an 
unusual example of a cyclone-subtropical ridge interaction, defined as a “through-the- 
ridge” case, in which the cyclone unexpectedly moves through an apparently 
Well-established subtropical ridge. 
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Fig. 23. Posterior probabilities of classifications into time-to-recurvature 
groups. Posterior probabilities (ordinate) for the N cases forecast as R-OOh (top 
left), R-24h (top right), R-48h (middle left), R-72h (middle right), R-96h (bottom 
left), and PRNR (bottom right) plotted in the verifying groups R-00h through 
PRNR (abscissa). Vertical lines indicate the correct classifications. 
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On 5 August 1983, the discriminant analysis model correctly forecasts Abby’s 
straight-track motion during the next 72 h and the posterior probability (35%) is rela- 
tively high. Referring to Fig. 23, only one case in the learning set had a PRNR forecast 
with a posterior probability greater than 35% and then recurved within the 72-h forecast 
period (R-12h). Therefore, the 35% posterior probability indicates that it is highly likely 
that the PRNR forecast is correct. Similarly, the second PRNR forecast has a relatively 
high posterior probability for the PRNR classification, which indicates the reliability of 
the PRNR forecast. Although the third PRINR forecast correctly predicts straight-track 
motion during the next 72-h period, the posterior probability that it belongs to that 
group is only 21%. Based on the learning set results in Fig. 23, a forecaster would have 
relatively less confidence in this PRNR forecast (line 3) than in the previous two PRNR 
forecasts (lines 1 and 2). However, the model continues to predict Abby as a straight- 
mover (or at least 84 h to recurvature) throughout the remainder of the recurvature pe- 
riod. The small posterior probability values indicate that the erroneous straight-track 
PRNR predictions are not likely to be correct. 


Table 20. DISCRIMINANT ANALYSIS MODEL FORECASTS TOR ST 
ABBY: Month-day-times from 5-9 August 1983 are indicated in the 
DIG column. Verification times to recurvature are given in the VERF 
column, The prediction of the most likely classification group (time to 
recurvature in hours or PRNR) is based on the highest classification 
function score and corresponds to the highest posterior probabilities 
given in the columns labeled 00 through PRNR. 


MODEL CLASSIFICATION GROUP 
OTe VERF PRED 
080500 =PRNR PRNR 


080512 PRHR 
080600 PRR 


080612 PRHR 
080700 PRIR 
080712 PRNR 
080800 84 
0808612 PRHR 
080900 

080912 
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VI. SUMMARY AND CONCLUSIONS 

































The feasibility of using an empirical orthogonal function (EOF) representation to 
identify the synoptic vorticity associated with tropical cyclone recurvature is examined. 
Recurvature, which is defined as a change in storm heading from west to east of 000 ° 
N, is evaluated from the Joint Typhoon Warning Center best track positions. In this 
EOF approach, the vorticity field is represented by the sum of 45 orthogonal 
eigenvectors that represent spatial patterns. Time-dependent coefficients are derived 
that indicate the importance of each pattern in the map series. The EOF coefficients are 
derived by Gunzelman (1990) from the 12-hourly U.S. Navy Global Band Analyses at 
700, 400 and 250 mb for 1979-1984 western North Pacific tropical cyclones. The first 
45 modes account for 73-78% of the variance in the relative vorticity fields. 

The classification goals are two-fold: first, to identify tropical cyclone motion during 
the 72-h forecast period as either straight or recurving; ane second, to forecast the time 
to recurvature with 12-h accuracy. The time series of the first and second EOF coeffi- 
cients for recurving storms vary in a systematic manner as the tropical cyclone moves 
around the subtropical ridge. In contrast, the coefficients for straight-moving storms 
tend to cluster about different mean EOF 1-2 values. Taking this Euclidean distance 
approach, additional EOF predictors are identified that best separate recurvers and 
straight-movers in multidimensional EOF space. Classification of an individual case is 
then into the closest 12-h time-to-recurvature group or straight-mover category as 
measured in multidimensional EOF space. The Euclidean approach provides physical 
insight into the classification problem and demonstrates skill relative to climatological 
forecasts. However, there is no objective method of determining the optimum set of 
predictors or weighting the individual predictors in the model according to their signif- 
icance is separating among the classification groups. 

A more objective discriminant analysis technique is employed to more fully exploit 
the predictive capabilities of these EOF coefficients. In this approach, the entire set of 
782 cases from 97 recurving and straight-moving tropical cyclones is used to both derive 
and test the recurvature model classifications. A final 250 mb discriminant analysis 
model is useful (72% correct) in identifying recurving (80%) and straight (66%) motion 
during the 72-h forecast period. Skill in distinguishing among the 12-h time to recurva- 
ture groups (R-00h through R-96h) plus the combined straight-emover and recurving 
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storm cases more than 96 h prior to recurvature (PRNR) is only 60, 29, 29, 12, 13, 22, 
19, 7, 29, and 47%, respectively. While these results represent improvement over the 
Euclidean model forecasts, the skill in identifying the time to recurvature is less than 
desired for operational use. The relatively poor skill in classifying cases in the interme- 
diate time to recurvature categories is attributed to the high variability among the 
synoptic fields that precede recurvature. Better skill (79% correct) in identifying storm 
motion during the 72-h forecast period can be achieved if classifications are only into 
two groups (recurver versus straight), rather than into the nine 12-h time-to-recurvatire 
groups plus PRNR. Thus, the number and composition of the classification groups 
must be a trade-off between the forecaster’s need to specify a precise time of recurvature 
versus the diminishing skill as more time precision is attempted in the forecast model. 

The EOF coefficients for 250 mb vorticity provide the best time-to-recurvature 
forecast skill. The coefficients for this pressure level are statistically the most distinct 
ainong the time-to-recurvature groups and the 250 mb eigenvectors represent more var- 
jance in the vorticity fields than those for the other two pressure levels. In addition, the 
magnitude of the vorticity of the subtropical ridge increases with height and is greatest 
at 250 mb. The 700 mb coefficients provide the next best model skill, Although the 
cigenvectors for this pressure level account for less variance than those for 400 mb, the 
relative vorticity gradients between the cyclone and the subtropical ridge are greatest at 
700 mb. Since more reliable data are available over open ocean areas at the upper levels 
from pilot reports and satellite-derived winds, the individual 12-hourly cases should be 
better defined and better forecast at 250 mb. 

Since no classification groups aie included for odd-mover motion, such as loops and 
Stairsteps, these types of tracks are forecast into the most similar time-to-recurvature 
group. For example, an anticyclonic loop might be classified as recurvature. Perhaps 
the EOF representation of synoptic vorticity will not be able to identify the precise type 
of odd-mover motion resulting from the smaller and faster time scale forcing mech- 
anisms such as multiple storm interactions. Thus, distinction between a storm that will 
merely step or loop to the northeast and one that will continue recurvature motion to 
the northeast is needed. 

The results from these feasibility tests indicate the usefulness of an EOF represen- 
tation of synoptic vorticity at one pressure level. Better skill may be achieved if the EOF 
coefficients for more than one pressure level are used, or if this EOF representation of 
the synoptic ficlds is combined with other factors such as persistence and climatology. 
Other analysis methods, such as multiple linear regression, that better exploit the time 


71 











trends in continuous data of this type should also be tested. As more data become 
available, independent testing and stratification of the sample will be possible. One 
problem with this initial investigation is that it is assumed that only one set of vorticity 
patterns leads to recurvature. In © , several distinct paths may be defined by the 
time-dependent coefficients in multidimensional space. Such differences could be due to 
different forcing mechanisms associated with recurvature, or more simply due to the 
differences in the large-scale vorticity patterns with latitude. While these preliminary 
results in pinpointing the precise (12-h) time to recurvature are somewhat discouraging, 
other statistical techniques may prove more successful. 
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